1.Spatial Analysis Methods and Practice Describe – Explore – Explain through GIS (George Grekousis) (Z-Library)
--- Page 2 ---
Spatial Analysis Methods and Practice
This is an introductory textbook on spatial analysis and spatial statistics through
GIS. Each chapter presents methods and metrics, explains how to interpret
results and provides worked examples. Topics include:
/C15 Describing and mapping data through exploratory spatial data analysis
/C15 Analyzing geographic distributions and point patterns
/C15 Spatial autocorrelation
/C15 Cluster analysis and multivariate data
/C15 Geographically weighted regression and linear regression
/C15 Spatial econometrics
The worked examples link theory to practice through a single real-world case
study, with software and illustrated guidance.
/C15 Exercises are solved twice: ﬁrst through ArcGIS and then GeoDa.
/C15 Through a simple methodological framework the book describes the
dataset, explores spatial relations and associations and builds models.
/C15 Results are critically interpreted, and the advantages and pitfalls of using
various spatial analysis methods are discussed.
This is a valuable resource for graduate students and researchers analyzing
geospatial data through a spatial analysis lens –including those using GIS in the
environmental sciences, geography and social sciences.
After completing his postdoctoral studies in the United States, George
Grekousis now teaches geography-related courses as an associate professor in
China. His interdisciplinary research focuses on spatial analysis, geodemo-
graphics, and arti ﬁcial intelligence. Dr. Grekousis has been awarded several
grants from well-known international bodies, and his research has been pub-
lished in several leading journals, including Computers, Environment and Urban
Systems, PLOS One , and Applied Geography .

--- Page 3 ---
“...the perfect introduction to the emerging ﬁeld of spatial data science. It is clearly
written, with realistic and carefully worked-out examples and based on a sound peda-
gogical approach. ”
–Luc Anselin, Director, Center for Spatial Data Science at the University of Chicago,
and creator of the GeoDa software
“Highly valuable and timely book for multidisciplinary professionals and students who
aim to work with spatial problems yet do not yet have the tools to study and solve these.
The book provides an excellent introduction to the concepts and tools to think and
analyze spatially, complemented by practical, realistic examples of how to apply this
knowledge. The book has suf ﬁcient depth and rigor to allow students at all levels to
learn for themselves and reach a good comprehension of a wide variety of aspects
within this scienti ﬁc domain. ”
–Professor Walter T. de Vries, Technical University of Munich
“...an excellent blend of key theoretical concepts and applications. It covers a wide range
of spatial topics and concepts while progressively building up in dif ﬁculty. The engaging
examples, demonstrative code, and laboratory follow-up exercises make this book suitable
for both self-learners and traditional academic settings. Highly recommended. ”
–Giorgos Mountrakis, State University of [REDACTED LOCATION] College of Environmental
Science and Forestry
“...introduces contemporary spatial analysis in a way that takes the reader from an
elementary position to advanced topics such as spatial econometrics. An excellent
course text for students of GIS, spatial statistics, quantitative geography, and ecology.
This is one of the ﬁrst syntheses of spatial analysis that develops the subject around the
basic notion that spatial relationships lie at the heart of understanding the correlations
that de ﬁne our geographic world. Essential reading for beginning students as well as
those who wish to refresh their knowledge with respect to newer tools such as geo-
graphically weighted regression and spatial econometrics ...introduces spatial analysis
to those with very little training in statistics while at the same time developing applica-
tions using standard software for spatial analysis based on the ArcGIS and Geoda
software systems. An excellent primer for anyone following a full course in spatial
analysis. Spatial analysis is a tough subject to teach, but Grekousis guides the reader
through the basic ideas about understanding how correlations de ﬁne our geographic
world, introducing the full range of spatial tools and models. ”
–Michael Batty, Centre for Advanced Spatial Analysis (CASA), University
College [REDACTED LOCATION] (UCL)
“A much welcomed and timely addition to the bookshelf of practitioners interested in
the quantitative analysis of geographical data. The book offers a clear and concise
exposition to basic and advanced methods and tools of spatial analysis, solidifying
understanding through worked real-world case studies based on state-of-the-art com-
mercial (ArcGIS) and public-domain domain (GeoDA) software. De ﬁnitely a book to be
routinely used as a reference on the practical implementation of key analytical methods
by people employing geographical data across a wide spectrum of disciplines. ”
–Phaedon Kyriakidis, Cyprus University of Technology

--- Page 4 ---
Spatial Analysis Methods
and Practice
Describe –Explore –Explain through GIS
GEORGE GREKOUSIS
Sun Yat-Sen University (SYSU)
With solved examples in ArcGIS, GeoDa and GeoDa Space


--- Page 5 ---
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, [REDACTED LOCATION], NY[REDACTED PHONE], USA[REDACTED PHONE] Williamstown Road, Port Melbourne, VIC[REDACTED PHONE], Australia[REDACTED PHONE]–[REDACTED PHONE], 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi –[REDACTED PHONE], India
[REDACTED ADDRESS], #06 –04/06, Singapore[REDACTED PHONE]
Cambridge University Press is part of the University of Cambridge.
It furthers the University ’s mission by disseminating knowledge in the pursuit of
education, learning, and research at the highest international levels of excellence.
www.cambridge.org
Information on this title: www.cambridge.org/[REDACTED PHONE]
DOI: [REDACTED PHONE]/[REDACTED PHONE]
[REDACTED COPYRIGHT] George Grekousis[REDACTED PHONE]
This publication is in [REDACTED COPYRIGHT]. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published[REDACTED PHONE]
Printed in the United Kingdom by TJ International, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Grekousis, George, author.
Title: Spatial analysis methods and practice : describe - explore - explain through GIS / George
Grekousis, Sun Yat-Sen University (SYSU), China.
Description: First edition. | [REDACTED LOCATION], NY : Cambridge University Press, [REDACTED PHONE]. |
Includes bibliographical references and index.
Identi ﬁers: LCCN[REDACTED PHONE] (print) | LCCN[REDACTED PHONE] (ebook) | ISBN[REDACTED PHONE]
(hardback) | ISBN[REDACTED PHONE] (paperback) | ISBN[REDACTED PHONE] (epub)
Subjects: LCSH: Spatial analysis (Statistics) | Geographic information systems.
Classi ﬁcation: LCC QA278.2 .G737[REDACTED PHONE] (print) | LCC QA278.2 (ebook) | DDC[REDACTED PHONE] –dc23
LC record available at [REDACTED URL] PHONE]
LC ebook record available at [REDACTED URL] PHONE]
ISBN[REDACTED PHONE][REDACTED PHONE]-2 Hardback
ISBN[REDACTED PHONE][REDACTED PHONE]-4 Paperback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.

--- Page 6 ---
Contents
Preface page xiii
1 Think Spatially: Basic Concepts of Spatial Analysis and Space
Conceptualization 1
Learning Objectives[REDACTED PHONE] Introduction: Spatial Analysis[REDACTED PHONE] Basic De ﬁnitions[REDACTED PHONE] Spatial Data: What Makes Them Special? [REDACTED PHONE] Conceptualization of Spatial Relationships[REDACTED PHONE].5 Distance Measure[REDACTED PHONE].5.1 Fixed Distance Band (Sphere of In ﬂuence) [REDACTED PHONE] Distance Decay[REDACTED PHONE].6 Contiguity: Adjacency Matrix[REDACTED PHONE].6.1 Polygons Contiguity[REDACTED PHONE].6.2 Adjacency Matrix[REDACTED PHONE].7 Interaction[REDACTED PHONE].8 Neighborhood and Neighbors[REDACTED PHONE].8.1 k-Nearest Neighbors ( k-NN) [REDACTED PHONE] Space –Time Window[REDACTED PHONE].8.3 Proximity Polygons[REDACTED PHONE].8.4 Delaunay Triangulation and Triangular Irregular
Networks (TIN) [REDACTED PHONE] Spatial Weights and Row Standardization[REDACTED PHONE].10 Chapter Concluding Remarks 33
Questions and Answers 34
Lab 1 The Project: Spatial Analysis for Real Estate Market
Investments 39
Overall Progress 39
Scope of Analysis 39
Dataset Structure 44
Guidelines 45
v

--- Page 7 ---
Section A ArcGIS 45
Exercise 1.[REDACTED ADDRESS]udy Region 45
Section B GeoDa 52
Exercise 1.[REDACTED ADDRESS]udy Region[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics 59
Learning Objectives[REDACTED PHONE].1 Introduction in Exploratory Spatial Data Analysis, Descriptive
Statistics, Inferential Statistics and Spatial Statistics[REDACTED PHONE].[REDACTED ADDRESS]atistics for
Visualizing Spatial Data (Univariate Data) [REDACTED PHONE] Choropleth Maps[REDACTED PHONE].2.2 Frequency Distribution and Histograms[REDACTED PHONE].2.3 Measures of Center[REDACTED PHONE].2.4 Measures of Shape[REDACTED PHONE].2.5 Measures of Spread/Variability –Variation[REDACTED PHONE].2.6 Percentiles, Quartiles and Quantiles[REDACTED PHONE].2.7 Outliers[REDACTED PHONE].2.8 Boxplot[REDACTED PHONE].2.[REDACTED ADDRESS]ot[REDACTED PHONE].[REDACTED ADDRESS]atistics for Analyzing
Two or More Variables (Bivariate Analysis) [REDACTED PHONE] Scatter Plot[REDACTED PHONE].3.[REDACTED ADDRESS]ot Matrix[REDACTED PHONE].3.3 Covariance and Variance –Covariance Matrix[REDACTED PHONE].3.4 Correlation Coef ﬁcient[REDACTED PHONE].3.5 Pairwise Correlation[REDACTED PHONE].3.[REDACTED ADDRESS]ot[REDACTED PHONE].4 Rescaling Data[REDACTED PHONE].[REDACTED ADDRESS]atistics[REDACTED PHONE].5.1 Parametric Methods[REDACTED PHONE].5.2 Nonparametric Methods[REDACTED PHONE].5.3 Con ﬁdence Interval[REDACTED PHONE].5.4 Standard Error, Standard Error of the Mean,
Standard Error of Proportion and Sampling
Distribution[REDACTED PHONE].5.5 Signi ﬁcance Tests, Hypothesis, p-Value and z-Score[REDACTED PHONE].6 Normal Distribution Use in Geographical Analysis[REDACTED PHONE].7 Chapter Concluding Remarks[REDACTED PHONE]
Questions and Answers[REDACTED PHONE]
Lab 2 Exploratory Spatial Data Analysis (ESDA): Analyzing and
Mapping Data 117vi Contents

--- Page 8 ---
Overall Progress[REDACTED PHONE]
Scope of the Analysis: Income and Expenses[REDACTED PHONE]
Section A ArcGIS[REDACTED PHONE]
Exercise 2.1 ESDA Tools: Mapping and Analyzing the Distribution of
Income[REDACTED PHONE]
Exercise 2.2 Bivariate Analysis: Analyzing Expenditures by Educational
Attainment[REDACTED PHONE]
Section B GeoDa[REDACTED PHONE]
Exercise 2.1 ESDA Tools: Mapping and Analyzing the Distribution of
Income[REDACTED PHONE]
Exercise 2.2 Bivariate Analysis: Analyzing Expenditures by Educational
Attainment[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns[REDACTED PHONE]
Learning Objectives[REDACTED PHONE].1 Analyzing Geographic Distributions: Centrography[REDACTED PHONE].1.1 Mean Center[REDACTED PHONE].1.2 Median Center[REDACTED PHONE].1.3 Central Feature[REDACTED PHONE].1.4 Standard Distance[REDACTED PHONE].1.5 Standard Deviational Ellipse[REDACTED PHONE].1.6 Locational Outliers and Spatial Outliers[REDACTED PHONE].2 Analyzing Spatial Patterns: Point Pattern Analysis[REDACTED PHONE].2.1 De ﬁnitions: Spatial Process, Complete Spatial
Randomness, First- and Second-Order Effects[REDACTED PHONE].2.2 Spatial Process[REDACTED PHONE].3 Point Pattern Analysis Methods[REDACTED PHONE].3.1 Nearest Neighbor Analysis[REDACTED PHONE].3.2 Ripley ’sKFunction and the LFunction Transformation[REDACTED PHONE].3.3 Kernel Density Function[REDACTED PHONE].4 Chapter Concluding Remarks[REDACTED PHONE]
Questions and Answers[REDACTED PHONE]
Lab [REDACTED ADDRESS]atistics: Measuring Geographic Distributions[REDACTED PHONE]
Overall Progress[REDACTED PHONE]
Scope of the Analysis: Crime Analysis[REDACTED PHONE]
Exercise 3.1 Measuring Geographic Distributions[REDACTED PHONE]
Exercise 3.2 Point Pattern Analysis[REDACTED PHONE]
Exercise 3.3 Kernel Density Estimation[REDACTED PHONE]
Exercise 3.4 Locational Outliers 203vii Contents

--- Page 9 ---
4 Spatial Autocorrelation[REDACTED PHONE]
Learning Objectives[REDACTED PHONE].1 Spatial Autocorrelation[REDACTED PHONE].2 Global Spatial Autocorrelation[REDACTED PHONE].2.1 Moran ’sIIndex and Scatter Plot[REDACTED PHONE].2.2 Geary ’s C Index[REDACTED PHONE].2.3 General G-Statistic[REDACTED PHONE].3 Incremental Spatial Autocorrelation[REDACTED PHONE].4 Local Spatial Autocorrelation[REDACTED PHONE].4.1 Local Moran ’sI(Cluster and Outlier Analysis) [REDACTED PHONE] Optimized Outlier Analysis[REDACTED PHONE].4.3 Getis-Ord GiandGi*(Hot Spot Analysis) [REDACTED PHONE] Optimized Hot Spot Analysis[REDACTED PHONE].5 Space –Time Correlation Analysis[REDACTED PHONE].5.1 Bivariate Moran ’sIfor Space –Time Correlation[REDACTED PHONE].5.2 Differential Moran ’sI[REDACTED PHONE].5.3 Emerging Hot Spot Analysis[REDACTED PHONE].6 Multiple Comparisons Problem and Spatial Dependence[REDACTED PHONE].7 Chapter Concluding Remarks[REDACTED PHONE]
Questions and Answers[REDACTED PHONE]
Lab 4 Spatial Autocorrelation[REDACTED PHONE]
Overall Progress[REDACTED PHONE]
Scope of the Analysis[REDACTED PHONE]
Section A ArcGIS[REDACTED PHONE]
Exercise 4.1 Global Spatial Autocorrelation[REDACTED PHONE]
Exercise 4.2 Incremental Spatial Autocorrelation and Spatial Weights
Matrix[REDACTED PHONE]
Exercise 4.3 Cluster and Outlier Analysis (Anselin Local Moran ’sI) [REDACTED PHONE]
Exercise 4.4 Hot Spot Analysis (Getis-Ord Gi*I) and Optimized Hot Spot
Analysis[REDACTED PHONE]
Exercise 4.5 Optimized Hot Spot Analysis for Crime Events[REDACTED PHONE]
Section B GeoDa[REDACTED PHONE]
Exercises 4.1 and 4.2 Global Spatial Autocorrelation and Spatial Weights
Matrix[REDACTED PHONE]
Exercise 4.3 Cluster and Outlier Analysis (Anselin Local Moran ’sI) [REDACTED PHONE]
Exercise 4.4 Hot Spot Analysis (Getis-Ord Gi*I) [REDACTED PHONE] Multivariate Data in Geography: Data Reduction and Clustering[REDACTED PHONE]
Learning Objectives[REDACTED PHONE].1 Multivariate Data Analysis 276viii Contents

--- Page 10 ---
5.2 Principal Component Analysis (PCA) [REDACTED PHONE] Factor Analysis (FA) [REDACTED PHONE] Multidimensional Scaling (MDS) [REDACTED PHONE] Cluster Analysis[REDACTED PHONE].5.1 Hierarchical Clustering[REDACTED PHONE].5.2 k-Means Algorithm (Partitional Clustering) [REDACTED PHONE] Regionalization[REDACTED PHONE].6.1 SKATER Method[REDACTED PHONE].6.2 REDCAP Method[REDACTED PHONE].7 Density-Based Clustering: DBSCAN, HDBSCAN, OPTICS[REDACTED PHONE].8 Similarity Analysis: Cosine Similarity[REDACTED PHONE].9 Chapter Concluding Remarks[REDACTED PHONE]
Questions and Answers[REDACTED PHONE]
Lab [REDACTED ADDRESS]atistics: Clustering[REDACTED PHONE]
Overall Progress[REDACTED PHONE]
Scope of the Analysis[REDACTED PHONE]
Section A ArcGIS[REDACTED PHONE]
Exercise 5.1 k-Means Clustering[REDACTED PHONE]
Exercise 5.2 Spatial Clustering (Regionalization) [REDACTED PHONE]
Exercise 5.3 Similarity Analysis[REDACTED PHONE]
Exercise 5.4 Synthesis[REDACTED PHONE]
Section B GeoDa[REDACTED PHONE]
Exercise 5.1 k-Means Clustering[REDACTED PHONE]
Exercise 5.2 Spatial Clustering[REDACTED PHONE] Modeling Relationships: Regression and Geographically Weighted
Regression[REDACTED PHONE]
Learning Objectives[REDACTED PHONE].1 Simple Linear Regression[REDACTED PHONE].1.1 Simple Linear Regression Assumptions[REDACTED PHONE].1.[REDACTED ADDRESS]s (Intercept and Slope
by OLS) [REDACTED PHONE] Multiple Linear Regression (MLR) [REDACTED PHONE] Multiple Regression Basics[REDACTED PHONE].2.2 Model Over ﬁt: Selecting the Number of Variables by
Deﬁning a Functional Relationship[REDACTED PHONE].2.3 Missing Values[REDACTED PHONE].2.4 Outliers and Leverage Points[REDACTED PHONE].2.5 Dummy Variables[REDACTED PHONE].2.6 Methods for Entering Variables in MLR:
Explanatory Analysis; Identifying Causes and Effects 363ix Contents

--- Page 11 ---
6.3 Evaluating Linear Regression Results: Metrics, Tests
and Plots[REDACTED PHONE].3.1 Multiple r[REDACTED PHONE].3.2 Variation and Coef ﬁcient of Determination
R-Squared[REDACTED PHONE].3.3 Adjusted R-Squared[REDACTED PHONE].3.4 Predicted R-Squared[REDACTED PHONE].3.5 Standard Error (Deviation) of Regression
(or Standard Error of the Estimate) [REDACTED PHONE] F-Test of the Overall Signi ﬁcance[REDACTED PHONE].3.7 t-Statistic (Coef ﬁcients ’Test) [REDACTED PHONE] Wald Test (Coef ﬁcient ’s Test) [REDACTED PHONE] Standardized Coef ﬁcients (Beta) [REDACTED PHONE] Residuals, Residual Plots and Standardized
Residuals[REDACTED PHONE][REDACTED PHONE] In ﬂuential Points: Outliers and High-Leverage
Observations[REDACTED PHONE].4 Multiple Linear Regression Assumptions: Diagnose and Fix[REDACTED PHONE].5 Multicollinearity[REDACTED PHONE].6 Worked Example: Simple and Multiple Linear Regression[REDACTED PHONE].7 Exploratory Regression[REDACTED PHONE].8 Geographically Weighted Regression[REDACTED PHONE].8.1 Spatial Kernel Types[REDACTED PHONE].8.2 Bandwidth[REDACTED PHONE].8.3 Interpreting GWR Results and Practical Guidelines[REDACTED PHONE].9 Chapter Concluding Remarks[REDACTED PHONE]
Questions and Answers[REDACTED PHONE]
Lab 6 OLS, Explanatory Regression, GWR[REDACTED PHONE]
Overall Progress[REDACTED PHONE]
Scope of the Analysis[REDACTED PHONE]
Exercise 6.1 Exploratory Regression[REDACTED PHONE]
Exercise 6.2 OLS Regression[REDACTED PHONE]
Exercise 6.3 GWR[REDACTED PHONE] Spatial Econometrics[REDACTED PHONE]
Learning Objectives[REDACTED PHONE].1 Spatial Econometrics[REDACTED PHONE].2 Spatial Dependence: Spatial Regression Models and
Diagnostics[REDACTED PHONE].2.1 Diagnostics for Spatial Dependence[REDACTED PHONE].2.2 Selecting between Spatial Lag or Spatial Error Model[REDACTED PHONE].2.3 Estimation Methods 459x Contents

--- Page 12 ---
7.3 Spatial Lag Model[REDACTED PHONE].3.1 Spatial Two-Stage Least Squares (S2SLS) [REDACTED PHONE] Maximum Likelihood[REDACTED PHONE].4 Spatial Error Model[REDACTED PHONE].5 Spatial Filtering[REDACTED PHONE].6 Spatial Heterogeneity: Spatial Regression Models[REDACTED PHONE].7 Spatial Regimes[REDACTED PHONE].8 Chapter Concluding Remarks[REDACTED PHONE]
Questions and Answers[REDACTED PHONE]
Lab 7 Spatial Econometrics[REDACTED PHONE]
Overall Progress[REDACTED PHONE]
Scope of the Analysis[REDACTED PHONE]
Exercise 7.1 OLS[REDACTED PHONE]
Exercise 7.2 Spatial Error Model[REDACTED PHONE]
Exercise 7.3 OLS with Spatial Regimes[REDACTED PHONE]
Exercise 7.4 Spatial Error by Spatial Regimes[REDACTED PHONE]
References[REDACTED PHONE]
Index 513xi Contents

--- Page 14 ---
Preface
As spatial data are more and more widely available, and as location-
based services, from smartphone applications to smart cities monitoring,
are becoming standard to everyday ’s humans ’interaction and commun-
cation, a growing number of researchers, scientists and professionals, far
crossing the typical boundaries of geography discipline, realize the need
for in-depth analysis of georeferenced data. Although geographical infor-
mation systems map and link attributes to locations, spatial data hide much
more treasure than a glossy mapping representation. To unlock this informa-
tion, spatial analysis is necessary, as it provides the methods and tools to
transform spatial data into knowledge, assisting in enhanced decision making
and better planning. As such, a tremendous demand for accurately analyzing
georeferenced data (including big data) across a wide range of disciplines
exists.
To respond to this demand, Spatial Analysis Methods and Practice is
an introductory book in spatial analysis and statistics through GIS. The
book presents spatial data analysis methods and geoinformation analysis tech-
niques to solve various geographical problems, following a “Describe –
Explore –Explain ”approach. Each chapter focuses on a single major topic,
introduces the related theory, explains how to interpret metrics ’outputs in a
meaningful way and, ﬁnally, provides worked examples.
The topics covered include:
/C15 Chapter 1 : Think Spatially: Basic Concepts of Spatial Analysis and Space
Conceptualization (Exercises solved with ArcGIS, GeoDa)
/C15 Chapter 2 : Exploratory Spatial Data Analysis Tools and Statistics (Exer-
cises solved with ArcGIS, GeoDa)
/C15 Chapter 3 : Analyzing Geographic Distributions and Point Patterns (Exer-
cises solved with ArcGIS)
/C15 Chapter 4 : Spatial Autocorrelation (Exercises solved with ArcGIS, GeoDa)
/C15 Chapter 5 : Multivariate Data in Geography: Data Reduction and Spatial
Clustering (Exercises solved with ArcGIS, GeoDa, Matlab)
/C15 Chapter 6 : Modeling Relationships: Regression and Geographically
Weighted Regression (Exercises solved with ArcGIS, Matlab)
/C15 Chapter 7 : Spatial Econometrics (Exercises solved with GeoDa space)
xiii

--- Page 15 ---
The book offers both a theoretical ( Theory ) and a practical ( Lab) section for
each chapter and adopts a “learn-by-doing ”approach. Theory presents in
detail concepts, methods and metrics, while Labapplies these metrics in solved
step-by-step examples through ArcGIS and GeoDA. Matlab scripts are also
offered for two labs.
Theory
Spatial analysis methods and techniques are described in a comprehensive and
consistent way through the following subsections:
/C15 Deﬁnition: Each subsection begins with the de ﬁnitions of the methods to
be presented. This allows for easy tracing of the de ﬁnition of a new
theory, concept or metric.
/C15 Why Use: The “why use ”statement follows. It offers an initial understand-
ing of the importance of a method or metric and also presents the type of
problems that these methods and metrics are more suitable to be
applied.
/C15 Interpretation: It is used to explain how we should interpret the outcomes
of spatial analysis methods and metrics and goes one step further from
just reporting numbers or maps without any further critical discussion.
/C15 Discussion and Practical Guidelines: This section discusses the pros and
cons of each method and metric. It also provides valuable tips on how to
implement them from a practical perspective. For example, guidelines
are offered to assist on how to select the appropriate parameters ’values
(of statistics/metrics/tools), thus avoiding accepting uncritically the
default values offered by software. Experimenting through various par-
ameters ’values and settings allows for better insight on the impact of
each parameter to the ﬁnal outcome. Potential case studies are also
presented.
/C15 Concluding Remarks: A list of important remarks and guidelines is pre-
sented at the end of each chapter, summarizing the key topics of the
theory.
/C15 Questions and Answers : A set of 10 questions and answers is presented
for self-evaluation.
Lab
Labfocuses on gaining hands-on practical experience through well-designed
solved examples. All the main metrics included in Theory are presented in the
Labof each chapter. This allows readers to gain knowledge on how to perform
spatial analysis and report results through step-by-step ArcGIS or GeoDaxiv Preface

--- Page 16 ---
commands. This section also highly emphasizes how to critically interpret
results so that spatial analysis leads to knowledge extraction assisting in
enhanced decision making and spatial planning.
A single worked example runs through the whole book. By working on a
single case study, readers can delve deeper into the different approaches
applied in spatial analysis. Chapter-by-chapter readers will gain a better under-
standing of the study region and thus interpretation of results would be easier
and more meaningful.
The general structure of each lab is as follows:
/C15 Overall Progress: A work ﬂow is presented at the beginning of each lab
showing the progress of the entire project. The exercises that each lab is
consisted of along with the tools to be used and the expected outcomes
are also presented graphically.
/C15 Scope of Analysis: The problem to be solved is described.
/C15 Actions: Step-by-step software guidance is provided to describe how to
solve the problem and report results.
/C15 Interpreting Results: Results are interpreted from the spatial analysis
perspective and in relation to the problem at hand.
The book is a valuable resource to a wide audience and is not strictly addressed
to geographers. Analysts, teachers, instructors, students of various majors and
researchers from interdisciplinary ﬁelds who are eager to analyze geospatial
data can bene ﬁt from this book. It provides the necessary concepts, methods,
metrics and the technical skills through geospatial analysis tools (ArcGIS,
GeoDa, and GeoDa Space) to study a variety of real-world problems pertaining
to socioeconomic issues, locational analysis and planning, human and urban
analysis, and ef ﬁciently assisting public policy and decision making. No previ-
ous knowledge of spatial analysis is required.
I am grateful for the help and advice of so many scholars, but as omissions
and mistakes are inevitable, I would greatly appreciate messages pointing out
corrections or suggestions, so that this book further improves. Errata will be
published on the book ’s website.xv Preface

--- Page 18 ---
1 Think Spatially
Basic Concepts of Spatial Analysis
and Space Conceptualization
THEORY
Learning Objectives
This chapter
/C15 Presents the basic concepts, terms and de ﬁnitions pertaining to spatial
analysis
/C15 Introduces a spatial analysis work ﬂow that follows a describe –explore –
explain structure
/C15 Presents in detail the reasons that spatial data are special –namely spatial
autocorrelation, scale, the modi ﬁable area unit problem, spatial hetero-
geneity, the edge effects and the ecological fallacy
/C15 Explains why conceptualization of spatial relationships is extremely
important in spatial analysis
/C15 Presents the approaches used to conceptualize spatial relationships
/C15 Explains how distance, contiguity/adjacency, neighborhood, proximity
polygons and space –time window are used in space conceptualization
/C15 Deﬁnes the spatial weights matrix, which is essential to almost every
spatial statistic/technique
/C15 Introduces the real-world project along with the related dataset to be
worked throughout the book
After a thorough study of the theory and lab sections, you will be able to
/C15 Implement a comprehensive work ﬂow when you conduct spatial analysis
/C15 Distinguish spatial from nonspatial data
/C15 Understand why spatial data should be treated with new methods (e.g.,
spatial statistics)
/C15 Understand the importance of applying conceptualization methods
according to the problem at hand
/C15 Understand essential concepts for conducting spatial analysis such as distance,
contiguity/adjacency, neighborhood, proximity polygons and space –time
/C15 Describe the spatial analysis process to be adopted for solving the real-
world project of this book
/C15 Presents the project ’s data with ArcGIS and GeoDa
1

--- Page 19 ---
1.1 Introduction: Spatial Analysis
“In God we trust. All others must bring data, ”said W. Edwards Deming
(American statistician and professor, [REDACTED PHONE] –[REDACTED PHONE]), as without data, there is little
to be done. Counting objects or individuals and measuring their characteristics
is the basis for almost every study. With the advent of geographic information
systems (GIS), it is simple to link nonspatial data (e.g., income, unemployment,
grades, sex) to spatial data (e.g., countries, cities, neighborhoods, houses) and
create large geodatabases . In fact, when data are linked to location , then
analysis becomes more intriguing, and spatial analysis and the science of
geography take over, as raw data are of a little value. Analyzing data through
spatial analysis methods and techniques allows us to add value by creating
information and then knowledge. Within this context, spatial analysis can be
deﬁned in various ways:
/C15 Spatial analysis is a collection of methods, statistics and techniques that
integrates concepts such as location, area, distance and interaction to
analyze, investigate and explain in a geographic context patterns,
actions, or behaviors among spatially referenced observations that arise
as a result of a process operating in space.
/C15 Spatial analysis is the quantitative study of phenomena that manifest
themselves in space (Anselin[REDACTED PHONE] p. 2).
/C15 Spatial analysis studies “how the physical environment and human activ-
ities vary across space –in other words, how these activities change with
distance from reference locations or objects of interest ”(Wang[REDACTED PHONE] p. 27).
/C15 Spatial analysis is“the process by which we turn raw data into useful
information, in pursuit of scienti ﬁc discovery, or more effective decision
making ”(Longley et al. [REDACTED PHONE] ).
/C15 Spatial (data) analysis is“a set of techniques designed to ﬁnd pattern,
detect anomalies, or test hypotheses and theories based on spatial data ”
(Goodchild[REDACTED PHONE] p. [REDACTED PHONE]).
/C15 Spatial analysis is a broad term that includes (a) spatial data manipulation
through geographical information systems (GIS), (b) spatial data analysis in
a descriptive and exploratory way, (c) spa tial statistics that employ statistical
procedures to investigate if inferences can be made and (d) spatial model-
ing which involves the construction of models to identify relationships and
predict outcomes in a spatial context (O ’Sullivan & Unwin[REDACTED PHONE] p. 2).
Why Conduct Spatial Analysis?
Spatial analysis concepts, methods, and theories make a valuable contribution
to analysis and understanding of
/C15 Social Systems: Spatial analysis methods can be used to study how
people interact in social, economic and political contexts, as space isthe underlying layer of all actions and interconnections among people.2 Think Spatially

--- Page 20 ---
/C15 Environment: Spatial analysis methods can be applied in studies related
to natural phenomena and climate change hazards, natural resources
management, environmental protection and sustainable development.
/C15 Economy: Spatial analysis methods can be used to analyze, map and
model interrelations among humans and various economic dimensions of
economic life.
The main advantage of spatial analysis is the ability to reveal patterns in data
that had not previously been de ﬁned or even observed. For example, using
spatial analysis techniques, one might identify the clustering of a disease
occurrence and then develop mechanisms for preventing expansion or even
eliminating it (Bivand et al. [REDACTED PHONE] ). In this respect, spatial analysis leads to better
decision making and spatial planning (Grekousis[REDACTED PHONE] ).
In a broad sense, there are four types of spatial analysis:
/C15 Spatial point pattern analysis: A set of data points is analyzed to trace if
it exhibits one of three states: clustered, dispersed ,random . Consider,
for example, a spatial arrangement of stroke events in a study area. Are
they clustered to a speci ﬁc region or are strokes randomly distributed
across space? Spatial analysis proceeds with a further investigation, such
as to determine the driving factors that lead to this clustering (potentially
the existence of nearby industrial zones and related pollution). Point
patter analysis also includes centrographics, a set of spatial statistics
utilized to measure the center, the spread and the directional trend of
point patterns. In this type of analysis, data typically refer to the entire
population and not to a sample.
/C15 Spatial analysis for areal data: Data are aggregated into prede ﬁned
zones (e.g., census tracts, postcodes, etc.), and analysis is based on how
neighboring zones behave and whether relations and interactions exist
among them (i.e., clusters of nonspatial data also form clusters in space).
For example, do people with high or low income cluster around speci ﬁc
regions, or are they randomly allocated? Spatial dependence ,spatial
heterogeneity ,spatial autocorrelation ,space conceptualization (through
spatial weights matrix) and regionalization (spatial clustering ) are central
notions in this type of analysis.
/C15 Geostatistical data analysis (continuous data): Geostatistical analysis is
the branch of statistics analyzing and modeling continuous ﬁeldvariables
(O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). In this respect, geostatistical data
comprise a collection of sample observations of a continuous phenom-
enon. Using various geostatistical approaches (e.g., interpolation), values
can be calculated for the entire surface. Pollution, for instance, is moni-
tored by a limited network of observation locations. To estimate pollu-
tion for every single point, we may apply interpolation techniques
through geostatistical analysis. Geostatistical analysis is not covered in
this book[REDACTED PHONE] Introduction: Spatial Analysis

--- Page 21 ---
/C15 Spatial modeling: Spatial modeling deals mainly with how spatial
dependence, spatial autocorrelation and spatial heterogeneity can be
modeled in order to produce reliable, predicted spatial outcomes.
Spatial modeling can be used, for example, to model how the value of
a house is related to its location. Spatial regression and spatial econo-
metrics are key methods in spatial modeling.
Spatial Analysis Work ﬂow
As spatial analysis is a wide discipline with a large variety of methods,
approaches and techniques, guidance on how to conduct such analysis is
necessary. This book introduces a new spatial analysis work ﬂow that follows a
describe –explore –explain structure in order to address what, where and why
questions respectively (see Figure 1.1 ).
Step A: Describe (What). This is the ﬁrst step in the spatial analysis process.
It describes the dataset through descriptive statistics . Descriptive statistics are
used to summarize data characteristics and provide a useful understanding
about the distribution of the values, their range and the presence of outliers.
Figure 1.1 Spatial analysis work ﬂow.4 Think Spatially

--- Page 22 ---
This step typically answers “what? ”questions, such as what is the mean
income of a neighborhood, or what is the population proportion living
under the poverty level? This step offers an initial understanding of the
dataset and its speci ﬁc characteristics. Still, if the data have not been
collected appropriately, then no analysis can lead to accurate and useful
results. For this reason, any dataset should be checked for consistency and
accuracy before any deeper analysis tak es place. Datasets without detailed
reports explaining the methods used and accuracies achieved should be
avoided (always cite in your studies the link to the database used and the
report that describes the methods used to collect the data along with the
associated quality controls).
Step B: Explore (Where). In the second step, exploratory spatial data
analysis (ESDA) is applied to explore and map data, locate outliers, test under-
lying assumptions or identify trends and associations among them, such as
spatial autocorrelation presence or spatial clustering. In this step, we mostly
answer “where? ”questions, such as where are the areas with low/high values in
income, is there any spatial clustering in the distribution of income per capita,
where is it located, and where are the crime hot spots in a city?
Step C: Explain (Why/How). In the last step, explanatory statistical analysis
through a spatial lens is applied to explain and understand causes and effects
through models. In this step, we attempt to answer “why?/how? ”questions.
These methods do not just identify associations but also attempt to (a) unveil
relations that explain why something happens and (b) trace the drivers behind a
change. Typical questions in a geographic context include why do crime events
cluster in a speci ﬁc area? Is there any link to the speci ﬁc socioeconomic
characteristics of this area? Why is income per capita linked to location, how
is income related to the size of a house? Which are the driving forces behind
sea-level rise, and how does population increase drive urban land cover
changes? In this type of analysis and in the context of this book, we treat
variables as either independent or dependent. The dependent variable (effect)
is the phenomenon/state/variable we attempt to explain. For example, if an
analysis concludes that population increase (driver-independent variable)
accounts for x% of urban land cover change (effect-dependent variable), then
there is a linkage (a relation) established that explains the degree that the
driver in ﬂuences the effect. We have now built a model that explains why
something happens which additionally can be used for predictions. This is a
step beyond steps A and B, which mostly address “what happens ”or“where
something happens. ”Spatial regression and spatial econometrics will be
described in this book concerning this stage of analysis. From the spatial
analysis perspective, several additional questions could be also addressed at
this stage: Can we learn something from this dataset and the applied method-
ology? Has new knowledge been created? What is the next step? How shouldfuture research proceed? When spatial analysis is completed, the knowledge
created enhances decision making and spatial planning[REDACTED PHONE] Introduction: Spatial Analysis

--- Page 23 ---
1.2 Basic De ﬁnitions
Box 1.1 More than 20 terms (in italics) related to spatial analysis, spatial
statistics and spatial thinking (one more) have been mentioned in the pre-
ceding section. Some terms might be comprehensive, others entirely new
and others quite vague. Let us start by building a common vocabulary and
presenting some key de ﬁnitions in this section. De ﬁnitions, terms and for-
mulas typically vary among books, which confuses not only nonspecialists
but scientists as well, causing much misunderstanding. This confusion has
also hampered statistical, GIS and spatial analysis software, especially when
referred to equations and formulas. This book presents the most commonly
used names and symbols for terms and statistics.
Deﬁnitions
Spatial statistics employ statistical methods to analyze spatial data, quantify a
spatial process, discover hidden patterns or unexpected trends and model
these data in a geographic context. Spatial statistics can be considered part
of ESDA, spatial econometrics and remote sensing analysis (Fischer & Getis[REDACTED PHONE] p. 4). They are largely based on inferential statistics and hypothesis
testing to analyze geographical patterns so that spatially varying phenomena
can be better modeled (Fischer & Getis[REDACTED PHONE] p. 4). Spatial statistics quantify
and map what the human eye and mind intuitively see when reading a map
depicting spatial arrangements, distributions or trends (Scott & Janikas[REDACTED PHONE]
p. 27; see also Chapter 2 ).
Spatial modeling deals with the creation of models that explain or predict
spatial outcomes (O ’Sullivan & Unwin[REDACTED PHONE] p. 3).
Geospatial analysis is the collection of spatial analysis methods, techniques,
and models that are integrated in geographic information systems (GIS; de Smith
et al. [REDACTED PHONE] ). Geospatial analysis is enriched with GIS capabilities and is used to
design new models or integrate existing ones in a GIS environment. It is also
used as an alternative term for “spatial analysis ”(de Smith et al. [REDACTED PHONE] ); strictly
speaking, however, spatial analysis is part of geospatial analysis (see Box 1.2 ).
Box 1.2 It is not always easy to distinguish between the terms “geo-
graphic, ”“spatial ”and “geospatial. ”These terms have been de ﬁned by
many experts within different scienti ﬁc contexts. The de ﬁnitions provided
here are not exhaustive; they serve as a basis for a common terminology.
Even within the science of geography, terms can overlap, and distinctions
can be vague. The term “geographic ”refers to a location relative to the
earth ’s surface combined with some type of representation. On the other6 Think Spatially

--- Page 24 ---
Box 1.2 (cont. )
hand, the term “spatial ”does not refer solely to the earth ’s surface; its
meaning is extended to a location combined with additional attribute data.
The term “geospatial ”is more computer oriented and refers to information
based on both spatial data and models, combining geographical analysis
with spatial analysis and modeling.
Spatial data refers to spatial entities with geometric parameters and spatial
reference (coordinates and coordinate system) that also have other nonspatial
attributes (see Figure 1.2 ; Bivand et al. [REDACTED PHONE] p. 7). For example, we can
describe a city by its population, unemployment rate, income per capita or
the average monthly temperature. When these data are linked to a location
through spatial objects (e.g., city postcodes), then we get spatial data. The
range of attributes to be joined to the spatial objects depends on the problem
being studied and the availability of datasets (e.g., census). Images that are
georeferenced are also considered spatial data.
Conceptually, there are two ways to represent the geographic entities and as
such to represent the world digitally: the object view of the world and the ﬁeld
view of the world (Haining[REDACTED PHONE] p. [REDACTED PHONE]).
Object view is a representation that describes the world with distinctive
spatial objects that are georeferenced to a speci ﬁc location using coordinates.
In the object view, spatial objects are modeled as points, lines or polygons (also
called features). This data model is called vector data model (O ’Sullivan &
Unwin[REDACTED PHONE] p. 6). The object view of the world and the vector data model
can be used to map, for example, demographic or socioeconomic data. Spatial
objects might be represented differently when the scale of analysis changes.
Figure 1.2 Spatial entities linked to attributes create spatial data[REDACTED PHONE] Basic De ﬁnitions

--- Page 25 ---
For instance, a city might be represented as a point feature when examined on
a national scale and as a polygon feature at a more local level.
Field view is a representation that describes the world as a surface of
continuously varying properties (O ’Sullivan & Unwin[REDACTED PHONE] p. 7). The ﬁeld view
of the world is more appropriate for depicting a continuous phenomenon/
property (e.g., temperature, pollution, land cover type, height). A way to record
aﬁeld is through the raster data model. In this model, rectangular cells (called
pixels) organized in a regular lattice, depict the geographic variation of the
property studied. Another way to model ﬁelds is by using triangulated irregular
networks (TINs). We can convert spatial data from one model to the other (i.e.,
vector to raster conversion or vice versa) according to the study ’s needs.
Variable is any characteristic an object or individual might have. For
example, age, height and weight are all variables characterizing humans,
animals, or objects.
Attributes are information carried by spatial data. They are stored as
columns in a GIS table. An attribute ﬁeld is equivalent to a variable in classic
statistics and has become the preferred term in GIS analysis, but these terms
can be used interchangeably. An attribute might be the population of a
postcode or the per-capita annual income in a census tract.
Data are produced when we measure the characteristics of objects or
individuals.
Value is the result of a measurement (or response) of a characteristic. In
statistics, the term score is also used to describe a variable ’s value.
Outlier is an unusual, very small or very large value compared to the rest of
the values.
Dataset is a collection of variables of any kind of objects. Typically, a spatial
dataset has a tabular format, whereby columns contain the attributes and rows
contain the spatial entities.
Population is the entire collection of observations/objects /measurements
about which information is sought.
Sample is a part of the entire population.
Level of measurement of a variable describes how its values are arranged in
relation to each other (de Vaus[REDACTED PHONE] p. 40). Variables/attributes are grouped at
three levels of measurement: nominal, ordinal and interval or ratio (Haining[REDACTED PHONE]
p. [REDACTED PHONE]; see Table 1.1 ).
/C15 Nominal variables are variables with values that cannot be ordered. For
example, race may be set as White = 1, Asian = 2, Hispanic = 3. This is a
nominal variable, as the values “1, 2, 3 ”do not to reveal rank but are used
as labels for the various categories. We cannot add the values of two
different objects like, let ’s say, “1+3 = 4, ”as“4”does not re ﬂect any
meaningful value. Another example of nominal variables is the “Name of
city”(e.g., Athens, [REDACTED LOCATION], [REDACTED LOCATION]) or the “Land Cover Name ”(e.g.,
Forest, Urban, Water). This type of attribute provides descriptive8 Think Spatially

--- Page 26 ---
information and can be used to label polygons on a map. The applicable
operators are “equal ”or“not equal ”(=,6¼).
/C15 Ordinal variables are variables whose categories can be ordered but
whose numerical differences are not meaningful and cannot be calcu-
lated. For example, the variable “Student ”might get the following
values: “Exceptional ”=1 , “Good ”=2 , “Need to study harder ”=3 .
We can order categories from top to bottom (or vice versa), but there is
no meaning in subtracting ( “Exceptional ”–“ Good ”=/C01). We can apply
the operators “equal, ”“not equal, ”“larger than ”and “smaller than ”(=,
6¼,>,<). Spatial entity ’s attributes measured at nominal or ordinal levels
are also called “categorical. ”
/C15 Interval and ratio variables (also called “numerical ”) are variables for which
each observation can be expressed in a numerically meaningful way.
Numbers are not used only as labels but may be used to calculate statistics
(e.g., the average). If the values of a numerical variable are limited to speci ﬁc
categories, then the variable is a discrete numerical, also called interval. The
interval level is a class of ratio level. In interval-level measurement,
categories are de ﬁned by ﬁxed distances. Interval data allow for the
operation of addition and subtraction (Haining[REDACTED PHONE] p. [REDACTED PHONE]). Still, inter-
val variables do not preserve ratios (O ’Sullivan & Unwin[REDACTED PHONE] p. 13).
Dichotomous variables (e.g., for the variable “sex, ”an individual might
be Male = 1, Female = 0, or the inverse) can also be regarded as discreteTable 1.1 Level of measurement per data structure model (vector/raster) and examples per data type. Applicable logical
and arithmetic operations are mentioned in parentheses. Many statistical procedures and techniques cannot be used at
all levels of measurements, as different logical and arithmetic operations apply to different levels. For example, binary
logistic regression is designed for dichotomous dependent variables and cannot be used for ratio variables. The level of
measurement de ﬁnes the pool of the statistical procedures to be used later in the analysis. From the statistical
perspective, more techniques can be used to analyze ratio variables than can be used for nominal and ordinal variables;
thus, ratio variables are preferred (de Vaus[REDACTED PHONE] p. 43).
Level of measurement Vector data model (object view)Raster data
model ( ﬁeld view)
Point Line Polygon Pixel
Nominal (=, 6¼) City name Road name Postcode ID Land cover type
Ordinal (=, 6¼,>,<) City most
desirable
to live (ranked)Road classi ﬁcation
type: (Avenue,
Highway)Postcode
classi ﬁcation
according to
education
attainmentForest land
cover subclasses
Interval (=, 6¼,>,<,+,/C0) Poverty level Width of road Poverty level for a
postcodeGround
temperature
Ratio (=, 6¼,>,<,+,/C0,x,/) Population Road freight Postcode data:
population,
income per capitaPollution PM2[REDACTED PHONE].2 Basic De ﬁnitions

--- Page 27 ---
interval-level variables. In this case, zero stands for the absence of
something. If the set of possible values is not limited to some categories
between low and high values, then the variable is a continuous numer-
ical (also called “ratio ”). Ratio variables have a meaningful zero. In ratio
variables, we can use all operators (=, 6¼,>,<,+ ,/C0,x ,/ ) .
To analyze variables/attributes, statistical methods can be employed. There
are three major branches of classic statistics: descriptive statistics, inferential
statistics andexplanatory statistics (Linneman[REDACTED PHONE] p. 20). We will deal with
all of them in this book.
Descriptive statistics is a set of statistical procedures that summarize the
basic characteristics of a given distribution. Descriptive statistics usually sum-
marize a speci ﬁc sample and are not appropriate for making inferences
regarding the total population (unless we have the entire population at hand).
As a result, they are not developed on the basis of probability theory as inferential
statistics are. In this sense, the results of descriptive statistics apply only to the
speci ﬁc dataset they have been calculated for. Descriptive statistics make use of
tables, graphs and simple statistical procedures (Linneman[REDACTED PHONE] p. 21).
Inferential statistics is the branch of statistics that analyzes samples to draw
conclusions for the entire population. Typical approaches for dealing with
inferential statistics include tests of signi ﬁcance (hypothesis testing), con ﬁ-
dence interval and Bayesian inference.
Explanatory statistics is the branch of statistics that uses methods and
techniques to identify relations among variables and potentially “explain ”
causalities. In this type of statistics, variables are treated as dependent or
independent (Linneman[REDACTED PHONE] p. 21). The dependent variable is what we
attempt to explain through a set of independent variables. Regression analysis
is typically used in explanatory statistics.
1.3 Spatial Data: What Makes Them Special?
Consider that a realtor stores the contact numbers of his clientele in his cell
phone. These contacts are data stored in his phone ’s memory in a casual type
of database. If these data are linked to location (in terms of coordinates or
addresses through geocoding), then they are transformed into spatial data.
Each contact is now attached to a single spatial object (e.g., a point denoting
the home address of each client that carries additional information –the
attributes –such as phone number, name, date of birth or e-mail) that can be
mapped. Transforming data to spatial data offers a lot more than a glossy
visualization. It allows for in-depth geoprocessing analysis and advanced spatial
querying. For example, which is the closest client to a speci ﬁc point, where do
the majority of clients live, where are clients who spend the most located, what
is the best route to their homes, what percent of clients live within a zone of10 Think Spatially

--- Page 28 ---
1 km or 2 km from a prede ﬁned location, do clients cluster around speci ﬁc
neighborhoods, and what is the socioeconomic pro ﬁle of these neighbor-
hoods? Some questions can not be addressed in a timely manner before linking
contacts to locations (imagine having thousands of clients) while others are
impossible to be answered.
Analyzing spatial data through spatial analysis methods enriches our research
by revealing hidden information. To unlock this information treasure, spatial
data are analyzed using various descriptive, exploratory and explanatory stat-
istical methods and techniques. The next chapters focus on these large classes
of spatial techniques. Still, it may not be obvious if and why spatial data are
different from nonspatial data and why we need to adopt new methods to
analyze them. Let us see why spatial data are special (see also Box 1.3 ).
First, spatial analysis implies a focus on many geographically related
parameters, such as location, distance, shape, area, neighborhood, adjacency
and interaction. The inclusion of geographical parameters in spatial analysis
differentiates data, methods and statistics from those of classical (nonspatial)
data analysis (Anselin[REDACTED PHONE] p. 2).
Second, many of the conventional statistical approaches used to analyze
nonspatial data cannot be directly applied to spatial data because examining
spatial data involves the following problems:
/C15 The existence of spatial autocorrelation/dependence . Many statistical
tests used for nonspatial data are based on the hypothesis that samples
are randomly selected and observations are independent (O ’Sullivan &
Unwin[REDACTED PHONE] p. 34). When we collect spatial data, however, this hypoth-
esis is usually violated. This phenomenon is described as “spatial depend-
ence. ”Tobler ’sﬁrst law of geography explains this basic property that
rules spatial data, stating that “everything is related to everything else,
but near things are more related than distant things ”(Tobler[REDACTED PHONE] ).
Spatial dependence is closely related to spatial autocorrelation. Spatial
autocorrelation is the degree of spatial dependency, association or cor-
relation between the value of an observation of a spatial entity and the
values of neighboring observations (Dall ’erba[REDACTED PHONE] ). The existence of
spatial autocorrelation in a spatial dataset is not necessarily a problem.
If there were no spatial dependence among objects, what would be the
reason for spatial analysis? In other words, if space did not make any
difference, then geographical analysis would be of no interest. Spatial
autocorrelation exists in many geographical problems, so we have to
adopt speci ﬁc tools to handle it. Spatial autocorrelation will be analyzed
in detail at Chapter 4 .
/C15 Conceptualization of space. Places are not isolated from each other.
Social, economic and demographic interactions occur among adjacent or
distant places. These interactions/relationships have a spatial dimensionas they unfold over space, so location and distance matter. As11 1.3 Spatial Data: What Makes Them Special?

--- Page 29 ---
mentioned, according to Tobler ’sﬁrst law of geography, “near things are
more related than distant things. ”To decode this rule, we have to de ﬁne
how “near ”and “distant ”an object should be from another object to be
named as such and how an object should be depicted to calculate these
distance metrics (e.g., point, line, polygon). When we apply methods to
analyze spatial data, we have to determine mathematically how close a
“close ”object is, how “contiguity ”is de ﬁned, what size a neighborhood
is and how we can integrate “space –time ”analysis. This means that we
have to set a number of geographical parameters to de ﬁne the spatial
relationships among objects. This is called the conceptualization of
spatial relationships , and it is a major difference between the methods
applied to spatial and aspatial data. The next section of this chapter
elaborates on this topic.
/C15 The choice of the geographical scale . The appropriate geographical
scale should be selected prior to any geographical/spatial analysis, as it
directly affects the selection of the data model, the data to be collected,
the methods to be used and the way conclusions will be drawn. For
example, a city may be represented as a point at a national scale, as a
polygon at the regional scale or as a set of polygons (e.g., postcodes) at a
more local scale. For a more detailed analysis, we may go deeper at the
census-track level (usually an area of 1,[REDACTED PHONE] people).
/C15 The choice of scale of analysis of data. The scale of analysis is closely
related to the geographical scale, but it is not quite the same (although
often it is used interchangeably). The scale of analysis de ﬁnes the size and
shape of the region that spatial statistics are calculated after the geograph-
ical scale has been set. It is essentially the level of understanding spatial
phenomena and is closely related to the problem in question. The scale of
analysis can be also considered as the geographical extent that spatial and
temporal variability is analyzed through the construction of an appropriate
spatial weights matrix. The geographical scale, on the other hand, refers to
the scale of the data (i.e., national scale [1:[REDACTED PHONE],[REDACTED PHONE]] or city scale [[REDACTED IP],[REDACTED PHONE]]).
The scale of analysis might be different if we study a different variable within
the same dataset. As such, although geographical scale typically remains the
same across the dataset, the scale of analysis depends on the spatial distribution
of the attributes ’values. Generally, larger distances re ﬂect broader trends (e.g.,
east to west) and smaller distances re ﬂect more local trends (e.g., between
neighborhoods). If we use a large scale of analysis, when we are looking at a
local level, we might generalize and lose hidden spatial heterogeneity.
In a hypothetical example, unemployment clustering might be evident at[REDACTED PHONE] m and 1,[REDACTED PHONE] m scale of analysis, re ﬂecting patterns of clustering at both the
census block level and the postcode level. In practice, the scale of analysis refers
to the distance that the spatial features (e.g., postcodes) will be analyzed tocalculate spatial statistics (see also Section 4.3 ).12 Think Spatially

--- Page 30 ---
/C15 The modi ﬁable areal unit problem (MAUP). Attribute data such as
those from censuses or socioeconomic databases are often aggregated
for privacy or simplicity reasons into prede ﬁned zones. The MAUP prob-
lem refers to the in ﬂuence the zone design has on the outcomes of the
analysis. A different designation would probably lead to different results.
The main concern is that the de ﬁnition of the zones (boundaries and
extent) is arbitrary with respect to the speci ﬁc geographical problem. For
example, in many cities, postcodes are smaller in the center and become
larger in the outskirts. How different would the statistical results be if
postcodes were designed to have the same area? A typical and well-
studied example of the MAUP is the[REDACTED PHONE] US presidential election. Al
Gore obtained more votes than George Bush but lost the election
because of the way counties were designed inside each state (O ’Sullivan
& Unwin[REDACTED PHONE] p. 38). A different designation of counties could have led
to a different outcome. The MAUP relates to both the scale of the
analysis and the aggregation of the data. In general, when we tend to
have larger areal units, we tend to aggregate data at a higher level, such
that generalization is more evident. Put simply, when generalization
exists, aggregated values tend to be more similar to the overall mean
(global mean), and deviations tend to be milder. In such a case, we may
lose valuable information.
/C15 Space heterogeneity: Nonuniform space. Space is not uniform. This is a
major factor differentiating spatial data from nonspatial data. Our every-
day life experience offers various examples of the non-smooth, noncon-
tinuous and non-isotropic effects of space. For example, natural and
planned breaks such as rivers, highways and parks alter space continuity.
When space is not uniform, locations have different probabilities of a
speci ﬁc value/action or process. For example, the land value might be
signi ﬁcantly higher in one side of the bank river compared to the other
one. Spatial heterogeneity is the variation between the values of a set of
observations inside the study area (Dall ’erba[REDACTED PHONE] ). When spatial hetero-
geneity exists, trends at speci ﬁc directions (e.g., east to west) or large
variations in neighboring observations might exist. For instance, socio-
economic differences are often very sharp in neighboring areas, some-
thing that reveals spatial heterogeneity (i.e., slums lie right next to
high-income suburbs in many megacities).
When there is a nonuniform space or when spatial heterogeneity exists,
collecting data might be problematic. Suppose a population is not distributed
evenly across an urban area (population heterogeneity is high). If we collect
and map stroke events across the city, we may ﬁnd areas in which strokes are
concentrated and form clusters. This does not necessarily mean that these
areas have a higher risk of stroke events. The stroke-event clustering may bedue to the fact that more people live in these areas (high population13 1.3 Spatial Data: What Makes Them Special?

--- Page 31 ---
heterogeneity). Fewer strokes are expected in less-populated areas. In such a
case, we have to take into account population distribution across space to
better trace if there is a linkage between location and unexpectedly high rates
of stroke. We might use measures of stroke density per capita for each subarea
inside the city and adjust for population heterogeneity.
Spatial heterogeneity does not invoke the absence of spatial autocorrela-
tion. Inside a study area, we may have areas with high spatial heterogeneity
with negative spatial autocorrelation, and other areas with positive autocorre-
lation in a nonuniform space. In fact, spatial dependence is not easily distin-
guished from spatial heterogeneity. This is also referred to in the literature as
the “inverse problem ”(Anselin[REDACTED PHONE] ). In spatial dependence, the correlation or
covariance among variables at distinct locations is determined by the spatial
arrangement of the objects in the geographic space (Anselin[REDACTED PHONE] ). However,
although clusters and patterns might be detected through various procedures
such as spatial autocorrelation tests, we cannot determine if these clusters are
due to structural change (heterogeneity) or to a true process that creates
clusters irrespective of the space heterogeneity.
/C15 Theedge effects problem. In the edge effects problem, spatial units
that lie in the center of the study area tend to have neighbors in all
directions, whereas spatial units at the edges of the study area have
neighbors only in some speci ﬁc directions. Row standardization is typic-
ally used to account for this asymmetry in the count of neighbors (see the
section on spatial weights later in this chapter).
/C15 Theecological fallacy. This problem occurs when a relationship that is
statistically signi ﬁcant at one level of analysis is assumed to hold true at a
more detailed level as well. This is a typical mistake that occurs when we
use aggregated data to describe the behavior of individuals. For
example, if at the postcode level the variable “higher income ”is strongly
correlated to “higher education obtained, ”this does not necessarily
mean that each person with higher education will have a high income.
This is the fallacy problem –the belief that, if something stands true at an
aggregation level, it is also true at a lower, more detailed level (e.g., the
individual level). The correct interpretation is that postcodes linked to
people with higher education tend to indicate higher incomes, not that
each individual with higher education will have a high income. To reach a
conclusion about the individual level and how education is linked to
income, we should conduct research at this level of analysis (getting data
at the individual level, not aggregated to some other level).
Box 1.3 Data that should have been treated as spatial are often analyzed
without taking into account their spatial dimensions. This happens either
because there is a lack of spatial thinking among the analysts or there is a14 Think Spatially

--- Page 32 ---
Box 1.3 (cont. )
lack of knowledge of how to use spatial analysis tools. In this case, geographical
space is withdrawn from the analysis, and data that should have been studied
at various geographical and analysis scales are studied only at the scale for
which data are available. As the spatial dimension is omitted, then one or more
of the aforementioned reasons that make spatial data special emerge. For
example, if spatial autocorrelation exists, then applying classical statistics often
violates the assumptions essential for drawing statistically signi ﬁcant results,
and the outcomes may be biased (Lee & Wong[REDACTED PHONE] ). Thus, spatial data are
special, and new methods and techniques that take into account spatial rela-
tionships and spatial conceptualization should be used for their analysis.
1.4 Conceptualization of Spatial Relationships
Deﬁnition
Conceptualization of spatial relationships is the modeling of the relationships
and interactions between features across space. Put simply, it mathematically
deﬁnes the terms near,far,adjacent ,contiguity ,neighborhood ,neighboring
and distance for a set of spatial objects by using speci ﬁc values or functions.
Why Use
Referring once more to Tobler ’sﬁrst law of geography, objects that belong to
the same neighborhood or are close to each other share common characteristics
and are likely to interact more than those that are further away. Conceptualizing
of spatial relationships is used to de ﬁne what is to be regarded as close, far,
adjacent or neighboring and is essential prior to any geographical analysis and
spatial statistical tool implementation. Important decisions at this stage include
/C15 The choice of the appropriate distance type to by applied (i.e., Euclidean,
Manhattan, travel time or inverse distance)
/C15 The contiguity de ﬁnition (i.e., by polygons sharing common borders or by
polygons in a prede ﬁned zone)
/C15 The number of neighbors to be used for spatial statistics calculations (i.e.,
10, [REDACTED PHONE] or just one)
/C15 The method for selecting the nearest neighbor (i.e., by distance or by
contiguity; if by distance, is it Euclidean or Manhattan? If by contiguity,
do we consider those that share a border or those that overlap?)
Setting different spatial relationships leads to different spatial statistics out-
comes for the same dataset. For this reason, it is crucial to de ﬁne spatial
relationships after thorough investigation avoiding the default values that mostGIS software packages apply[REDACTED PHONE].4 Conceptualization of Spatial Relationships

--- Page 33 ---
Interpretation
The more precise the conceptualization of spatial relationships for a set of
spatial objects, the more accurate the outcomes of the statistical tests and
models will be. If the applied conceptualization method fails to re ﬂect the
inherent structure of the spatial relationships of the spatial features, the out-
comes of the analysis will be misleading. Suppose we want to de ﬁne the
catchment area (neighborhood) of a coffee shop so that we conduct later an
analysis of the socioeconomic characteristics of the people living or working
within this area. To model the spatial relationships between the people and the
coffee shop, we make the following assumption: people within a 1 km radius of
the coffee shop are more likely to interact with the coffee shop than are people
outside this zone. This simple conceptualization of spatial relationships ﬁrst
deﬁnes the shape of the neighborhood (circle) and then its size (1 km). Different
shapes (e.g., square or hexagon) and sizes (e.g., 2 km) could be used as well.
Obviously, a catchment area de ﬁned by a 10 km radius would provide inaccur-
ate results. It is more reasonable to expect that smaller distances would be
more appropriate, as the main target group of this type of store is people who
live or work within walkable distances. Different conceptualizations of the
catchment area will lead to completely different results.
Discussion and Practical Guidelines
There are various methods of spatial relationships conceptualization, all of
which attempt to better model the inherent structure of a spatial dataset in
terms of spatial adjacency and neighborhood shaping. The following methods
can be used:
A. Distance. Deﬁne distance threshold or distance function (see section
1.6):
/C15 Distance type (e.g., Euclidean, Manhattan, Minkowski, Network)
/C15 Fixed distance band (distance threshold)
/C15 Distance decay (distance function)
B. Adjacency. Deﬁne which objects are regarded as adjacent (see sections
1.6, 1.7):
/C15 Contiguity edges only (Rook ’s Case)
/C15 Contiguity edges corners (Queen ’s Case)
/C15 Higher-order contiguity
/C15 Interaction (this includes distance and adjacency)
C. Neighborhood. Deﬁne what makes a neighborhood (see section 1.8):
/C15 k-nearest neighbors
/C15 Proximity polygons
/C15 Delaunay triangulation
D. Space –Time. Deﬁne distance and time windows for spatiotemporal
analysis (see section[REDACTED PHONE])16 Think Spatially

--- Page 34 ---
To decide which approach to use, we have to examine our problem from a
conceptual perspective ﬁrst and trace the potential spatial relationships. If we
study traf ﬁc, it is more rational to use network or Manhattan distance instead of
Euclidean distance. If we study population density in an urban agglomeration,
it is more appropriate to use a distance decay function (e.g., inverse distance),
as areas further away from the center are likely to be less densely populated.
Many spatial statistics tools require that a conceptualization method of
spatial relationships be set prior to any analysis. Some of these statistics
presented in detail in this book are as follows:
/C15 Global spatial autocorrelation (Global Moran ’sI, General G-Statistic)
/C15 Cluster and outlier analysis (Anselin Local Moran ’sI)
/C15 Hot Spot Analysis (Getis-Ord Gi)
/C15 Spatiotemporal Autocorrelations (Bivariate Moran ’sI, Differential Mor-
an’sI)
/C15 Generate spatial weights matrix
/C15 Geographically weighted regression
1.5 Distance Measure
Among the most common distance measures used in geographical analysis are
the Euclidean distance, the Manhattan distance, the Minkowski distance, the
Pearson ’s correlation distance, the Spearman correlation distance, the network
distance, and the geodetic distance. In spatial statistics, Euclidean and Man-
hattan distance are those most widely used.
Deﬁnitions and Formulas
For two points A, B, where X1,Y1are the coordinates of point A and X2,Y2are
the coordinates of point B, measured on a projected coordinate system (plane
surface using Cartesian coordinates), distance can be de ﬁned as follows:
/C15 Euclidean distance is the distance between two points A and
B connected by a straight line calculated as ( 1.1):
S¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ
X2/C0X1 ðÞ2þY2/C0Y1 ðÞ2q
(1.1)
/C15 Manhattan distance is the vertical plus horizontal difference (measured
along the axes) between points A and B and is calculated as ( 1.2):
S=|X2/C0X1|þj Y2/C0Y1j (1.2)
Manhattan distance can be used to model the distance between points in
urban environments when the street network is not available. Building
blocks create barriers, and we usually cannot move directly from pointA to point B in a straight line. Manhattan distance simulates this type of17 1.5 Distance Measure

--- Page 35 ---
travel, which is why it should be used when we study attributes related to
travel (e.g., access to schools, zone of in ﬂuence of a shop) in an urban
environment. When we analyze socioeconomic characteristics such as popu-
lation, income or education, Euclidean distance is a more rational choice.
/C15 Minkowski distance is a generalization of Euclidean and Manhattan
distance ( 1.3):
S=( (X2/C0X1)p+(Y2/C0Y1)p)1/p(1.3)
Forp= 1, we obtain the Manhattan distance and for p= 2, the Euclidean
distance. Minkowski distance is used among others in Principal Compon-
ent Analysis and Clustering Analysis (see Chapter 5 ).
Pearson ’s correlation distance considers two vectors to be close (similar) if
they are highly correlated. In a multivariate dataset and for two variables with
correlation r,P e a r s o n ’s correlation distance is ( 1.4) (see Section[REDACTED PHONE] for correl-
ation coef ﬁcient rdeﬁnition and equation):
S=1/C0r (1.4)
When we conceptualize spatial relationships by distance, in addition to
choosing which of the preceding distance metrics to use, we should choose
the range of distance within which spatial objects are regarded as having
spatial relationships. The range of distance is closely related to the scale of
analysis. Two widely used methods to de ﬁne the range of distance are the ﬁxed
distance band and the distance decay.
[REDACTED PHONE] Fixed Distance Band (Sphere of In ﬂuence)
Deﬁnition
Fixed distance band is a distance value expressing the size of a sphere of
inﬂuence around a spatial object. All spatial entities inside the sphere of in ﬂu-
ence are weighted equally, while spatial entities outside the zone are assigned
zero weight.
Why Use
Fixed distance band is used to shape neighborhoods containing equally
treated spatial entities.
Interpretation
We de ﬁne a sphere of in ﬂuence by setting a ﬁxed distance (a threshold distance
value). Within this zone, all spatial objects are weighted equally, while objects
outside the sphere are not accounted in calculations (for weight matrix calcula-
tion or spatial statistics), as they are not supposed to interact with objects inside
the zone (i.e., their weight is zero).
Choosing an appropriate distance band value is crucial, as it sets the scale of
the analysis and determines how spatial statistics will respond. For example,most of the spatial statistics metrics require at least one neighbor for each18 Think Spatially

--- Page 36 ---
spatial feature. When the number of neighbors is small, the statistical results
(e.g., z-scores) are unreliable. A small distance band might lead to objects with
no neighbors. A large value might create excessively large neighborhoods and
thus undesirable aggregations. In addition, with large distance bands, compu-
tational costs increase, especially for features with thousands of neighbors. We
should use a distance band ensuring that each spatial object has at least one
neighbor on the one hand and not too many on the other. When distributions
are skewed, around eight neighbors should be included to yield reliable
results. Various tools and procedures can be used to de ﬁne the appropriate
distance band for a speci ﬁc dataset or study area, as explained next.
Tools for Determining Distance Band
There is no optimal distance band for a speci ﬁc problem. Given heterogeneity
in space, various scales of analysis can be applied. It depends on the problem
at hand and the scopes of the analysis for determining the appropriate range of
the distance band. Analyzing the distribution of schools in the core of a city in
relation to the underlying population would require a relatively small scale of
analysis ranging from some hundred meters to 1 –2 km, as larger distances
would not be reasonable for the daily transfer of children to schools. On the
other hand, analyzing hospitals distribution for the same case study would
require a larger distance band as hospitals have a larger catchment area.
Common approaches to selecting the distance band include the following:
/C15 Select a distance based on previous experience. If the researcher has
solid knowledge of the spatial phenomenon studied and the case study
area, or if there is a rich literature supporting a speci ﬁc range of values to
deal with a problem, then a distance band can be set accordingly.
/C15 Use a distance band that ensures each object has at least one neighbor. Still,
in the presence of locational outliers (see Chapter 3 ), this approach is
inappropriate, as the distance band will be large enough. Although it will
ensure at least one neighbor for some objects, it may create neighborhoods
of hundreds of objects for others. Thus, the scale of analysis would be
misleading. To overcome this problem, we should exclude locational outliers
and calculate the distance at which each object has at least one neighbor.
We should also set a minimum number of neighbors for those objects that do
not have any neighbor inside the distance band selected. Using this
approach, we de ﬁne a more realistic distance band, ensuring that outliers
are also included by calculating their weights based not on distance but on
the count of nearby neighbors (for more details, see Section 4.3).
/C15 Select a distance at which spatial autocorrelation is more pronounced
(see Chapter 4 ). When spatial clustering exists, spatial autocorrelation
techniques can be used to determine the appropriate distance band forthe analysis. Such techniques include incremental spatial autocorrelation,
optimized hot spot analysis and Ripley ’sk-function[REDACTED PHONE].5 Distance Measure

--- Page 37 ---
Discussion and Practical Guidelines
Before we de ﬁne any appropriate distance band, we have to trace locational
outliers (see Chapter 3 ). If the data are skewed, then the distance band should
be neither so small that it includes only one or two neighbors nor so large that
neighborhoods are as large as the study area because the z-scores computed
in the various spatial statistics tests would be unreliable. In case of skewed data,
z-scores are reliable when neighbors are approximately eight.
Most software packages calculate a default band whereby each object has at
least one neighbor. Still, this is not always a suitable choice, especially when
locational outliers exist (see the preceding section, “Tools for Determining
Distance Band ”). Aﬁxed distance band is appropriate when we have a good
theoretical background for the problem or good knowledge of the case study
area. A ﬁxed distance band also works well for point data and polygon data in
hot spot analysis or other local spatial autocorrelation techniques.
[REDACTED PHONE] Distance Decay
Deﬁnition
Distance decay is any function that implies a continuous, smooth and attenu-
ating effect of distance on the attribute values of neighboring spatial entities
(Longley et al. [REDACTED PHONE] p. [REDACTED PHONE]; see Figure 1.3 ).
Why Use
Distance decay is used to express the spatial interaction among various loca-
tions by applying weights that change relatively by distance so that closer
objects have larger weights (stronger interaction) than those further away. It
is used as the practical implementation of Tobler ’sﬁrst law of geography.
Interpretation
The distance decay concept considers that the physical or socioeconomic
interaction between two points declines over space in a systematic way relative
to distance. Various types of functions can be used, including reciprocal func-
tion (or inverse distance; see Figure 1.3A ), negative power (or inverse distance
squared for a power of 2; see Figure 1.3B ), negative exponential or linear with a
negative slope (which is uncommon). Inverse distance has a milder slope than
inverse distance squared. A typical distance decay relationship (inverse dis-
tance or inverse distance squared) could be used to describe the height of the
buildings in a city: the further away from the center, the lower the buildings
become. Similarly, for population density, the further away from the city center,
the lower the density.
Discussion and Practical Guidelines
It is generally recommended to use a ﬁxed distance band in conjunction with a
distance decay function (either as a cutoff point or a zone of indifference):20 Think Spatially

--- Page 38 ---
/C15 Fixed distance band applied as cut off point (see Figure 1.3C ). Beyond
this point, the distance decay is not calculated as no interaction is likely to
be evident; any analysis using distances larger than this value would not
provide valuable information.
/C15 Fixed distance band applied prior to the distance function (zone of
indifference) (see Figure 1.3D ). In this case, we de ﬁne a distance where
Figure 1.3 Distance decay graph. The larger the distance between two points, the
less impact on each other. (A) Inverse distance has a similar but milder slope than
Inverse distance squared. (B) We may apply inverse distance squared when we have
evidence that distance in ﬂuences objects to a higher degree than in inverse distance.
(C) A threshold value (or ﬁxed distance band) can be used to terminate a distance
decay function. (D) A ﬁxed distance band can be used prior to a distance decay
function. This distance is also called the “zone of indifference, ”as all objects are treated
equally inside it. After this distance, objects are weighted according to the
distance decay[REDACTED PHONE].5 Distance Measure

--- Page 39 ---
all spatial objects are treated equally. Beyond this distance, a decay
function is applied, and weights are calculated accordingly. This
approach is suitable when there is a zone where all objects have the
same interaction. Instead of imposing sharp boundaries after this zone,
we apply a distance function, so that there is a smooth effect of distance
on the values of the neighboring entities.
Combining distance decay functions and a ﬁxed distance band reduces com-
putational time and re ﬂects reality better. As objects at very large distances are
likely to exhibit small interactions, the weights are almost zero, so instead of
calculating them, we terminate the process at a prede ﬁned distance value.
Distance decay functions may be used for continuous data. For instance,
inverse Euclidean distance may be used to model temperature or pollution.
On the other hand, inverse Manhattan distance might be more appropriate for
point analysis inside urban environments (e.g., customer analysis) when the
road network is not available. In hot spot analysis, distance decay inverse
distance should be avoided, as it tends to produce small and isolated hot or
cold spots (see Chapter 4 ).
Example
Suppose you want to select a coffee shop for your daily coffee consumption,
and distance from your home is one of the most important criteria (in real life,
additional parameters determine this choice, but we will use only distance; see
Figure 1.4 ). It does not make a signi ﬁcant difference if the distance from your
home to the coffee shop is[REDACTED PHONE] m or[REDACTED PHONE] m. In fact, if there are two coffee
shops –one[REDACTED PHONE] m and the other[REDACTED PHONE] m from your location –you will most likely
walk the extra 50 m if the second one is better. If the distance is 1 km, however,
you may think twice about walking that far. Thus, we can use a ﬁxed distance
band for, say, [REDACTED PHONE] m around your home in which all available coffee shops are
Figure 1.4 Examples of different distance decay used per analysis.22 Think Spatially

--- Page 40 ---
treated equally by you in terms of willingness to walk to reach them. After this
distance, we may apply a distance decay function; for example, going to a
coffee shop just 30 m further away (after your cutoff point of[REDACTED PHONE] m) is not
irrational, whereas, outside this cutoff distance, all coffee shops receive pro-
portionally lower weights in relation to distance and are not as likely to be
visited as those inside the zone of in ﬂuence. On the other hand, if you want to
go to a pharmacy as soon as possible, then you do care which is the closest. An
inverse distance squared would be an appropriate conceptualization method in
this case, as you want to add more weight/importance to those closest to you.
Finally, if you change mode and you aim to travel by bicycle, then you could
elect to buy some food from your local grocery at a greater distance. An
inverse distance would then be appropriate.
1.6 Contiguity: Adjacency Matrix[REDACTED PHONE] Polygons Contiguity
Deﬁnition
Contiguity is a spatial property that describes whether a target object and
one or more other objects are in close proximity. In practice, contiguity
refers to which polygons are assigned as neighbors for a single target object.
The most common contiguity conceptualization methods for polygon features
are:
/C15 Contiguity edges only (Rook ’s Case). Only those polygons that share a
common border (edge) are regarded as neighboring and are included in
calculations for the target polygon (see Figure 1.5A ).
/C15 Contiguity edges corners (Queen ’s Case). In this case, polygons that
share borders and also have common corners (nodes) are considered
neighbors and are included in calculations for the target polygon (see
Figure 1.5B ).
Why Use
Contiguity is used to de ﬁne neighborhoods used in the calculation of the
spatial weights matrix and various spatial statistics.
Discussion and Practical Guidelines
The preceding methods of contiguity conceptualization are appropriate when we
model data or contiguous processes represented by polygons in order to de ﬁne
neighborhoods or some type of interaction. The order of contiguity is also import-
ant in de ﬁning a neighborhood. First-order contiguity is when interaction and
neighborhood are considered only for the immediately contiguous features, while
a second-order contiguity re ﬂects that the target object is affected from the
neighbors of the neighbors (Anselin[REDACTED PHONE] ). Something that should be considered23 1.6 Contiguity: Adjacency Matrix

--- Page 41 ---
while calculating higher-order contiguity weights is whether or not the lower-order
neighbors should be included (Anselin[REDACTED PHONE] ;s e e Figure 1.5B ). When higher
order does not include lower order, it can be referred as exclusive (or pure)
higher-order contiguity (Anselin[REDACTED PHONE] ). When higher-order contiguity additionally
includes lower order, it is named inclusive higher-order contiguity (Anselin[REDACTED PHONE] ).
For example, second-order contiguity that includes ﬁrst order as well is named
Target polygon
First-order contiguity
polygon
Second-order contiguity
polygon
Other polygon
Figure 1.5 (A) Rook ’s (share border) de ﬁnition of contiguity. The polygons immediately
adjacent to target feature D are A, B, C, F and E ( ﬁrst-order contiguity). (B) Queen ’s case
(share border and corners). G and H polygons are additionally included in the
neighborhood of D, which is now composed of A, B, C, E, F, G and H ( ﬁrst-order
contiguity). (C) Exclusive (pure) second-order contiguity for Rook ’s Case for D includes
G, H, I, K and L as these are the neighbors of the ﬁrst-order contiguity neighbors (A, B,
C, E and F) of the target feature. In case of inclusive second-order contiguity the
neighborhood of D is composed of both the neighbors and the neighbors of neighbors
(A, B, C, E, F, G, H, I and K).24 Think Spatially

--- Page 42 ---
inclusive second-order contiguity. A pure higher-order contiguity (which does not
include lower-order neighbors) is used when studying the effects of spatial lags on
spatial autocorrelation and on spatial au toregressive models, where redundant and
circular paths of polygon features are not taken into account (Anselin[REDACTED PHONE] ).
Rook ’s or Queen ’s contiguity conceptualizations behave well when we con-
sider that spatial interaction increases if two polygons share an edge, a node or
both. In general, if the polygons are of similar size, then polygon contiguity is
an appropriate conceptualization method. For polygons of different sizes (e.g.,
small polygons in the city center and large polygons in the outskirts), polygon
contiguity applies different scales of analysis, which is undesirable. For this
reason, row standardization should be used with polygon contiguity for spatial
weights matrix calculation (see Section 1.9).
[REDACTED PHONE] Adjacency Matrix
Deﬁnition
In the context of spatial analysis and for polygon representation, the adjacency
matrix is a square matrix, the elements of which indicate whether pairs of
polygons are adjacent or not.
Why Use
An adjacency matrix is used to represent the various forms of contiguity and to
deﬁne neighborhoods for further spatial analysis.
Interpretation
An adjacency matrix is a symmetrical matrix the off-diagonal elements of which
take values of either 0 or 1 and the diagonal elements of which have no values.
If two spatial entities are adjacent (either ﬁrst-order or higher-order), then the
corresponding matrix element is set to 1 and 0 otherwise. For a given target
polygon in the matrix, the polygons with an adjacency value of 1 (in the same
row or column) de ﬁne its neighborhood (the set of features to be taken into
account for calculating spatial statistic for the target feature).
Discussion and Practical Guidelines
For the shaded polygons depicted in Figure 1.5A and Rook ’s Case contiguity
(only share edges), the adjacency matrix is ( 1.5):
Adj¼ABCDEF SUM
A∗ [REDACTED PHONE]
B1 ∗ [REDACTED PHONE]
C1 0 ∗ [REDACTED PHONE]
D[REDACTED PHONE] ∗ [REDACTED PHONE]
E[REDACTED PHONE] ∗ 02
F[REDACTED PHONE] ∗ 1
SUM[REDACTED PHONE] ∗[REDACTED PHONE][REDACTED PHONE]([REDACTED PHONE].6 Contiguity: Adjacency Matrix

--- Page 43 ---
We use an asterisk (*) for polygons ’adjacency with their selves. The sum
of each row or column equals the total number of adjacent polygons to
each single polygon. If we de ﬁne as the neighborhood for a target object
(polygon) those that are directly adjacent ( ﬁrst-order), then a polygon is a
neighbor to the target object if the value in the matrix is 1; otherwise,
they are not neighbors. For example, polygon A has three ﬁrst-order neighbors
(B, C, D).
1.7 Interaction
Deﬁnition
Interaction is the degree of linkage between two locations, the origin and the
destination. It is calculated as a combination of distance and adjacency. In its
simplest form, its formula is ( 1.6):
Interaction ¼wi,jPiPj
db
i,j(1.6)
where
i,jare the locations denoting the origin and destination respectively
wis some short of weight between iand j. For example, wmight be 0 if
objects are not adjacent or 1 if they are.
Piand Pjare their respective values for a certain variable (e.g., population)
dijis any distance function between i,j
bis an exponent to determine the rate of declining
Why Use
In spatial statistical analysis, calculating interaction can be used as a way to
calculate weights in a spatial weights matrix (see Section 1.9).
Interpretation
The larger the value, the stronger the interaction between two locations.
Discussion and Practical Guidelines
Spatial interaction models are mainly used to study spatial ﬂows, such as in
migration, tourism, commuting, international trade and money ﬂow. For a
spatial interaction to occur, three independent conditions must exist:
/C15 A supply set of locations (destination) and a demand (origin) set of
locations (e.g., commuters [demand] traveling to their job [supply])
/C15 Alternative locations for both points of origin or destination
/C15 Origins and destinations that are linked26 Think Spatially

--- Page 44 ---
1.8 Neighborhood and Neighbors
Deﬁnition
Neighborhood in the spatial analysis context is a geographically localized area
to which local spatial analysis and statistics are applied based on the hypothesis
that objects within the neighborhood are likely to interact more than those
outside it.
Why Use
Deﬁning the appropriate neighborhood is necessary for the accurate perform-
ance of spatial statistics. Most of these statistics require a neighborhood
deﬁnition and the construction of a spatial weights matrix that re ﬂects the
intensity of the relationships among the spatial entities.
Interpretation
The size of a neighborhood determines how observations are aggregated.
Neighborhoods with too few or too many objects will probably yield unreliable
statistical results.
Discussion and Practical Guidelines
Deﬁning a neighborhood is not a trivial task, as there are many ways to
conceptualize its formation. As explained previously, the distance band
(Section[REDACTED PHONE]) and polygon adjacency (Section[REDACTED PHONE]) methods can be used
to de ﬁne neighborhoods. Other conceptualization methods include k-nearest
neighbors, space –time proximity, proximity polygons and Delaunay
triangulation, described next.
[REDACTED PHONE] k-Nearest Neighbors ( k-NN)
Deﬁnition
In the context of spatial analysis, k-NN is a method used to de ﬁne a neighbor-
hood based on the k-nearest neighbors to the target object.
Why Use
This method ensures that each object will have at least a certain number of
neighbors. More broadly, it is used to
/C15 Deﬁne the neighborhood (region) in which spatial objects will be
accounted in spatial statistics calculations
/C15 Deﬁne the neighborhood in which spatial objects are likely to be more
similar than objects at further distances
/C15 Model spatial relationships27 1.8 Neighborhood and Neighbors

--- Page 45 ---
Interpretation
k-NN is based on the calculation of a distance matrix used to store any type of
distance among all objects in the dataset.
For example, the distance matrix (based on Euclidean distance) for all the
polygon pairs in Figure 1.6 is (1.7):
Distance ¼ABCDE
A∗ [REDACTED IP]
[REDACTED IP]∗ [REDACTED IP]
[REDACTED IP]∗ [REDACTED IP]
[REDACTED IP]∗ [REDACTED IP]
[REDACTED IP]∗[REDACTED PHONE][REDACTED PHONE](1.7)
Fork= 2, the nearest neighbors matrix ( NN)i s(1.8):
NN¼AB C D E SUM
A∗ [REDACTED PHONE]
B1 ∗ [REDACTED PHONE]
C1 0 ∗ [REDACTED PHONE]
D0 0 1 ∗ 12
E0[REDACTED PHONE] ∗ 2
SUM[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE](1.8)
To build the k= 2 nearest neighbor matrix ( NN), for each row of the original
Distance matrix ( 1.7), we keep the two ( k) smaller values and replace them with
1. All other values are replaced with 0. This matrix ( NN) resembles an adjacency
matrix, but it is not symmetrical. Each row sums to 2 (as many as k), but each
column results in a different outcome. The columns ’sum reveals how many
times a speci ﬁc object (the header of the respective column) is nearest (either
ﬁrst or second in our case) to other polygons.
Fork= 2, each neighborhood consists of three spatial objects (two neigh-
bors and the target polygon). For example, the nearest neighbors to polygon
Figure 1.6 Example of k-nearest neighbors and distance matrix calculation.28 Think Spatially

--- Page 46 ---
B (check the row) are polygons A and D (see Figure 1.6 ). Polygon D is closer to
B than to A and is thus called the “ﬁrst nearest neighbor of B, ”while A is called
the “second nearest neighbor of B ”(see distance matrix). By inspecting rows
A and D, we notice that polygon B is not nearest to any of these. This is not
illogical, as other polygons are closer than B. Thus, the columns do not add up
to the same number, but the rows should add up to two ( k).
Discussion and Practical Guidelines
Setting the optimal number of kis not trivial. The selection is a trade-off
between creating neighborhoods that are homogenous in characteristics and
of similar aerial size. If spatial objects are distributed based on a competitive
process –that is, if objects tend to lie from each other at similar distances
throughout the entire study area –thek-nearest neighbors method produces
reliable results when used in spatial statistics. Still, this method is vulnerable to
polygon size or point density. When the distribution is not uniform or polygon
sizes vary widely, using k-nearest neighbors might signi ﬁcantly change the
scale of analysis. For example, postcodes in a city ’s business center are usually
small in size, while postcodes in a city ’s outskirts are usually larger. If the same
number of neighbors is set, the neighborhoods at the outskirts will be far larger
than those in the center and will probably aggregate data, leading to valuable
information loss. Thus, using a ﬁxed kfor the entire study area is not always
desirable. It is advisable to use a distance decay function (with a distance band)
and k-nearest neighbors only for those objects that do not have any neighbor
at the ﬁxed distance band. This will ensure that all objects will have at least one
neighbor. When the distribution of values associated with the objects is
skewed, a rule of thumb indicates that each object should have around eight
neighbors.
[REDACTED PHONE] Space –Time Window
Deﬁnition
Space –time window is a conceptualization method for de ﬁning a neighbor-
hood based on both distance and time. An object is in the same neighborhood
with the target object if it falls within the speci ﬁed distance and also falls within
the speci ﬁed time interval.
Why Use
This method can be used to identify spatiotemporal hot spots or clusters
created based on spatial and temporal proximity.
Interpretation
Spatial entities close to each other in both space and time are analyzed
together, creating a spatio-temporal neighborhood. If a feature lies near to
another in terms of distance but not in terms of space (or vice versa), it is not
included in the spatial statistics calculations. In other words, two spatial entities29 1.8 Neighborhood and Neighbors

--- Page 47 ---
close in distance might not be included in a spatiotemporal neighborhood if
their temporal distance is large.
Discussion and Practical Guidelines
A distance band and a time window should be speci ﬁed. If, for example, we
provide a distance of 1 km and a time interval of three hours, all features found
within 1 km from a target object that also have a date/time stamp within three
hours of each other would be analyzed together, as they would form a spatio-
temporal neighborhood. Selecting the appropriate distance band and time
window is not trivial, and good knowledge of the region and the problem at
hand is needed.
[REDACTED PHONE] Proximity Polygons
Deﬁnition
Proximity polygons are polygons that divide space into regions so that the
nearest centroid of each point in a region is the one in the polygon it lies in.
Proximity polygons are also called Thiessen polygons or Voronoi polygons
(O’Sullivan & Unwin[REDACTED PHONE] p. 43).
Why Use
With proximity polygons, space is partitioned evenly regarding proximity.
Thiessen and Voronoi polygons are tessellation methods that divide a plane
into non-overlapping polygons and can be used for spatial interpolation to
estimate catchment areas for public services or commercial businesses (Illian
et al. [REDACTED PHONE] p. 46).
Interpretation
For a given set of point features (e.g., post of ﬁces, police stations, coffee
shops), proximity polygons can be used to create regions in which
/C15 Each entity (e.g., post of ﬁce) is the centroid of the polygon created
/C15 Each point in space is included in only one polygon (no overlapping polygons)
/C15 Each point in space is closest to the entity (e.g., post of ﬁce) of the
polygon centroid it belongs to
/C15 Points at the boundaries lie at the same distance from the centroids of
the respective polygons
Discussion and Practical Guidelines
By using proximity polygons, we create neighborhoods, or zones of in ﬂuence,
for a speci ﬁc spatial entity. Taking post of ﬁces as an example, someone living
within a speci ﬁc proximity polygon is included in the neighborhood of the post
ofﬁce that lies in the centroid of this polygon. In environmental analysis,
Thiessen polygons are used to estimate values of a continuous ﬁeld (e.g.,
temperature, pollution). Any point inside the polygon will have the same value
as the one at the centroid of the polygon. However, the sharp change in the30 Think Spatially

--- Page 48 ---
neighboring points lying close to (i.e., in and out of ) a shared boundary of two
adjacent polygons can make this representation problematic. Delaunay tri-
angulation is a smoother way to interpolate values, as is explained next.
[REDACTED PHONE] Delaunay Triangulation and Triangular Irregular Networks (TIN)
Deﬁnition
Delaunay triangulation partitions space by creating triangles from point fea-
tures or polygon centroids whose proximity polygons share an edge (O ’Sulli-
van & Unwin[REDACTED PHONE] p. 51).
Why Use
This method is suitable in cases where isolated polygons (e.g., islands) exist in the
dataset or where the spatial distribution of objects is abnormal (e.g., when mixed
large polygons with adjacent small polygons are scattered in the study area).
Interpretation
Points or centroids connected by triangle edges are regarded as neighbors.
This ensures that each object will have at least one neighbor.
Discussion and Practical Guidelines
The edges of the triangles constructed using Delaunay triangulation are non-
overlapping, creating a set of triangular facets that ef ﬁciently depict surfaces. These
triangles create a network called triang ular irregular network (TIN). Regarding
space conceptualization, Delaunay triang ulation creates neighborhoods consisting
of quite regular triangles, as the minimum interior angle of all triangles is maxi-
mized, and long triangles are thus avoided. In a more general context, calculating
TINS based on the height of points and combining them with vector data such as
roads, streams or mountain peaks provide a more realistic view of the earth ’s
surface.
1.[REDACTED ADDRESS]andardization
Deﬁnitions
Spatial weights are numbers that re ﬂect some sort of distance, time or cost
between a target spatial object and every other object in the dataset or
speci ﬁed neighborhood. Spatial weights quantify the spatial or spatiotemporal
relationships among the spatial features of a neighborhood.
Spatial weights matrix is the matrix that stores the spatial weights.
Row standardization is the process of scaling the spatial weights to a range
between 0 and 1. It is used to avoid biased data sampling or when data are
aggregated from larger datasets[REDACTED PHONE].[REDACTED ADDRESS]andardization

--- Page 49 ---
Why Use
A spatial weights matrix is used to depict the degree of connection among the
objects inside a speci ﬁc neighborhood (Dall ’erba[REDACTED PHONE] ).
Interpretation
Any method of conceptualizing spatial relationships ends up creating a spatial
weights matrix. This matrix quanti ﬁes the spatial relationships among the
objects. It is then used as the fundamental matrix for most spatial analysis
techniques. In its simplest form, it takes binary values (when used with ﬁxed
band distance, k-nearest neighbors and contiguity spatial relationships): 1 if
two objects have a spatial relationship and 0 if two objects do not have a spatial
relationship. For a distance decay function, interaction conceptualization
method or user-de ﬁned weights, the matrix elements can receive any mean-
ingful value. The larger the weight, the stronger the relationship.
Applying the inverse distance (1/(Distance)) conceptualization method for
the polygons depicted in Figure 1.6 , the elements of the spatial weights matrix
(1.9)r eﬂect the inverse of the distance of the corresponding set of polygons
(see Distance matrix 1.7):
Spatial Weights ¼ABCDE SUM
A∗ 0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE]
B0:[REDACTED PHONE]∗ 0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE]
C0:[REDACTED PHONE] :[REDACTED PHONE]∗ 0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE]
D0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE]∗ 0:[REDACTED PHONE] :[REDACTED PHONE]
E0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE]∗ 1:[REDACTED PHONE][REDACTED PHONE](1.9)
A spatial weights matrix is almost always automatically generated by the
software used when we apply spatial statistics by de ﬁning a conceptualization
method.
Discussion and Practical Guidelines
Row standardization is a method for scaling weights to a range between 0 and
1. Each weight is divided by either the sum of all weights in its row or the sum
of weights of the neighboring features ( [REDACTED PHONE]). By doing so, results are adjusted
so that the number of neighbors has no effect on the ﬁnal results.
Standardized Spatial Weights
¼ABCDE SUM
A ∗ 0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE]
B0 :[REDACTED PHONE] ∗ 0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE]
C0 :[REDACTED PHONE] :[REDACTED PHONE] ∗ 0:[REDACTED PHONE] :[REDACTED PHONE]
D0 :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] ∗ 0:[REDACTED PHONE]
E0 :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] ∗ 1
SUM 0:[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE] :[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE])32 Think Spatially

--- Page 50 ---
In the standardized matrix ( [REDACTED PHONE]), every row adds up to 1. Column totals re ﬂect
how much interaction a spatial object has. For example, object D has the
largest column sum, revealing that it has the strongest interaction with all the
others. This also reveals a more central location. Object B has the lowest
column total, indicating the least interaction with the remaining polygons. In
the preceding example, we calculated the spatial matrix for all objects in the
region. However, it is common to calculate spatial weights only for those that
belong to the same neighborhood (according to the adjacency matrix).
Row standardization is recommended when there is a potential bias in the
distribution of spatial objects and their attribute values due to poorly designed
sampling procedures. For example, when we collect samples that are clustered
in some parts of the study area (as a result of a poorly designed sampling
method), some objects will have a higher probability of having more neighbors
at close distances. As such, attribute values are likely to cluster together due to
spatial autocorrelation. Row standardization mitigates these effects.
Row standardization should also be used when polygon features refer to
administrative boundaries or any type of man-made zones. Census or socio-
economic data are often aggregated from larger datasets to polygons of
unequal sizes. Still, how polygons are designed might in ﬂuence the outcomes
of spatial analysis, something that is termed as the modi ﬁable areal unit prob-
lem (see Section 1.3). For this reason, it is better to standardize weights as a
percentage of the sum of their neighboring ones. This allows us to generalize
our results at a higher level and mitigate zone design problems.
Row standardization is also recommended for binary weights matrix, ﬁxed
distance band and polygon contiguity conceptualization, to adjust for a differ-
ent number of neighbors per observed object. Lastly, row standardization is
not required for analyzing point features (i.e., traf ﬁc accidents, crime events)
that have not resulted from aggregation or any sampling procedure.
[REDACTED PHONE] Chapter Concluding Remarks
/C15 The necessary steps of a spatial analysis work ﬂow are describe, explore
and explain.
/C15 Spatial data might be points, lines, polygons or ﬁelds (i.e., pixels).
/C15 Spatial data are special and thus must be treated differently.
/C15 Spatial autocorrelation is a major reason why classic statistical
approaches cannot be used with spatial data.
/C15 The appropriate spatial relationship conceptualization method depends
largely on the problem studied.
/C15 A wrong selection of conceptualization method might yield unreliable
statistical results.
/C15 A very large or very small ﬁxed distance might distort the results and lead
to misinterpretation[REDACTED PHONE].10 Chapter Concluding Remarks

--- Page 51 ---
/C15 Each of the conceptualization methods presented in this chapter assume
that the effects of distance are continuous and uniform in every direction
(isotropic). However, this is not always the case, as spatial heterogeneity
is evident in reality.
/C15 Having a non-isotropic or noncontinuous space does not prohibit the
utilization of the preceding methods. It simply indicates that we should
select the appropriate methods.
/C15 The geographical scale determines the data representation, data collec-
tion, methods, research questions and outcomes. For this reason, setting
the right scale is usually the ﬁrst task in any geographical analysis.
/C15 The selection of the appropriate scale of analysis de ﬁnes the size and
shape of the region that spatial statistics are calculated after the geo-
graphical scale has been set. It is essentially the level of understanding
spatial phenomena and is closely related to the problem in question.
/C15 It is quite common that smaller distances are more suitable for geograph-
ical analysis at the local scale.
/C15 The scale of analysis can be also considered as the geographical extent
that spatial and temporal variability is analyzed through the construction
of an appropriate spatial weights matrix.
/C15 We should not assume that a statistical relationship that holds at an aggre-
gated level describes the behavior of individuals (the ecological fallacy).
/C15 The in ﬂuence of a spatial object on each neighboring one is not only a matter
of distance. We thus have to include para meters other than distance effects.
This will make the geographical analysis completer and more accurate.
/C15 Row standardization is nearly always suggested in case of polygons.
Questions and Answers
The answers given here are brief. For more thorough answers, refer back to the
relevant sections of this chapter.
Q1.What are spatial analysis and geospatial analysis? Is there any difference
between these terms?
A1. Spatial analysis is a collection of methods, statistics and techniques,
which integrates concepts such as location, area, distance, interaction
to analyze, investigate and explain in a geographic context, patterns,
actions or behaviors among spatially referenced observations that arise
as a result of a process operating in space. Geospatial analysis is the
collection of spatial analysis methods, techniques and models that are
integrated into geographic information systems (GIS). Geospatial analysis
is enriched with more advanced GIS capabilities and geoinformation
applications than spatial analysis has and is used to design new models
or integrate existing ones in a GIS environment. “Geospatial analysis ”
can be used as an alternative term for “spatial analysis ”; strictly speaking,34 Think Spatially

--- Page 52 ---
however, spatial analysis is part of geospatial analysis but focuses more
on the methods than on the technologies.
Q2.What are the steps of a spatial analysis approach as presented in
this book?
A2. Step A: Describe (What). This is the ﬁrst step of a spatial analysis process
and involves describing the dataset through descriptive statistics. Step B:
Explore (Where). In the second step, exploratory spatial data analysis
(ESDA) is used to explore and map data, locate outliers, test underlying
assumptions and identify associations among them, such as spatial auto-
correlation or spatial clustering. In this step, we mostly answer “where? ”
questions. Step C: Explain (Why/How). In the last step, explanatory
statistical analysis through a spatial lens is applied to explain and under-
stand causes and effects using models. In this step, we attempt to answer
“why?/how? ”questions
Q3.Based on the stages of a spatial analysis approach to solve a given
problem, what are the key questions that spatial analysis attempts to address?
A3. Spatial analysis attempts to answer three basic sets of questions: What,
Where, and How/Why . In the “What ”set of questions, we study the
status of speci ﬁc variables. For example, what is the mean, maximum or
minimum value of a variable? Along with exploratory spatial data analysis
and related mapping techniques, it offers a ﬁrst indication of how vari-
ables are distributed in space. Then, we ask “Where ”questions. Where
are the areas with low/high values in income? Is there any spatial cluster-
ing in the distribution of income per capita? Where is it located? Is there
a crime hot spot? Where is it? These questions provide a solid analysis
based on spatial statistics that quantify results as being signi ﬁcant or not
and identify interesting spatial patterns that are not detectable other-
wise. Finally, the analysis delves deeper by identifying and modeling
relationships by answering “How/Why ”questions. Models can be
created that reveal how, for example, the value of a variable is related
to its location.
Q4.What are the three levels of measurement on a variable/attribute? Give
an example of each, and name their main differences.
A4. Variables/attributes are grouped at three levels of measurement: nom-
inal, ordinal, and ratio. Nominal variables are variables with values that
cannot be ordered. For example, race may be set as White = 1, Asian =
2 and Hispanic = 3. Nominal variables cannot be added or subtracted.
Ordinal variables are variables whose categories can be ordered but
whose numerical differences are not meaningful and cannot be calcu-
lated. For example, the variable “Student ”might get the following
values: “Exceptional ”=1 , “Good ”=2 , “Need to study harder ”= 3. This
is an ordinal variable, as we can order categories from top to bottom (or
vice versa), but there is no meaning in subtracting ( “Exceptional ”–
“Good ”=/C01). Ratio are variables for which each observation can be35 Questions and Answers

--- Page 53 ---
expressed in a numerically meaningful way. For example, population is a
ratio variable.
Q5.What is the conceptualization of spatial relationships? Why is it
important?
A5. Conceptualization of spatial relationships is the modeling of the relation-
ships and interactions between features across space. Put simply, is the
mathematical de ﬁnition of the terms near,far,adjacent ,contiguity ,
neighborhood ,neighboring and distance for a set of spatial objects by
using speci ﬁc values or functions. Spatial analysis techniques and calcu-
lations are based on how the spatial relationships among spatial objects
have been modeled. Conceptualizing spatial relationships is important
because the closer to reality a conceptualization of spatial relationships
is, the more accurate the outcomes of the statistical tests or models will
be. On the other hand, if the applied conceptualization method fails to
reﬂect the inherent structure of the spatial relationships of the dataset,
the analysis outcomes will be misleading.
Q6.What are the sphere of in ﬂuence and the zone of indifference?
A6. A sphere of in ﬂuence is a ﬁxed distance value whereby all the spatial
entities inside the zone it creates are weighted equally, while spatial
entities outside the zone receive zero weight. It can be applied as a
cutoff point at a distance decay function. A zone of indifference is a
distance band used prior to a distance decay function. All objects are
treated equally inside this zone. After this distance, objects are weighted
according to the distance decay.
Q7.What is a neighborhood, and why is it important in spatial analysis?
A7. A neighborhood in the spatial analysis context is a geographically local-
ized area to which local spatial analysis and statistics are applied based
on the hypothesis that objects within the neighborhood are likely to
interact more than those outside it. De ﬁning the appropriate neighbor-
hood is necessary for the accurate performance of spatial statistics. Most
of these statistics require a neighborhood de ﬁnition and the construction
of a spatial weights matrix that re ﬂects the intensity of the relationships
among the spatial entities in this neighborhood.
Q8.What are Thiessen polygons and Delaunay triangulation? Why are
they used?
A8. Thiessen polygons, also called “Voronoi polygons, ”are proximity poly-
gons. Delaunay triangulation partitions space by creating triangles from
point features or polygon centroids whose proximity polygons share an
edge. Both methods are used to divide space into regions with speci ﬁc
attributes that can be used to de ﬁne neighborhoods, calculate spatial
interpolation or estimate catchment areas for public services or commer-
cial businesses.
Q9.What are spatial weights and a spatial weights matrix? Why are
they used?36 Think Spatially

--- Page 54 ---
A9. Spatial weights are numbers that re ﬂect some sort of distance, time or
cost between a target spatial object and every other object in the dataset
or speci ﬁed neighborhood. Spatial weights quantify the spatial or spa-
tiotemporal relationships among the spatial features of a neighborhood.
A spatial weights matrix stores the weights and is used to depict the
degree of connection among the objects inside a speci ﬁc neighborhood.
Q10. What is row standardization, and when should we apply it?
A10. Row standardization is the process of scaling the spatial weights to a
range between 0 and 1. In row standardization, each weight for a spatial
object is divided by either the sum of all weights in the row or the sum of
weights of the neighboring features for this spatial object. By doing so,
we adjust our results so that the number of neighbors has no effect on
theﬁnal results. Row standardization is recommended mainly in two
cases: (a) when there is a potential for bias in the distribution of spatial
objects and their attribute values due to a poorly designed sampling
procedure and (b) when data have been aggregated from larger datasets
into polygons of different sizes.37 Questions and Answers

--- Page 56 ---
LAB 1
THE PROJECT: SPATIAL ANALYSIS FOR REAL ESTATE MARKET
INVESTMENTS
Each lab is consisted of two sections, namely Section A (ArcGIS) and Section
B (GeoDa). Section A provides step-by-s tep instructions on how to solve the lab
exercises using ArcGIS. Interpretation of the results and concluding remarks are
also presented. Section B applies GeoDa functionalities to solve the same exer-
cises. As such, readers may opt to solve the lab ’s exercises either using a leading
commercial software or a well-established open-source freeware. The interpret-
ation of the results as well as the conclusions related to the analysis are not
repeated in Section B, as they are presented in the related Interpreting Results
paragraphs of Section A. The reader should study these sections carefully, as they
are independent of the software used. In addition, Overall Progress and Scope of
Analysis sections precede Sections A and B to offer a better understanding of the
spatial analysis process as well as the motivation for the analysis of each lab.
Overall Progress
Scope of Analysis
A realtor wants to offer the best options to its clientele other than that of just
listing its properties. As the company ’sm o t oi s “location, location, location, ”it
Figure 1.7 Lab 1 work ﬂow and overall progress.39 Scope of Analysis

--- Page 57 ---
provides geographical and socioeconomic data (for example, census data,
income data, crime data) and offers location analytics through advanced spatial
statistics and GIS. By conducting such an analysis, the company aims to become
more competitive and also more reliable by answering tailored questions from
each client regarding their unique investment needs. The more information
available on location and surrounding areas, the higher the probability of suc-
cessful investments. Spatial analysis offers the tools and mathematical back-
ground required to provide quantitative answers visualized through GIS maps
that are usually better than personal opinions and general beliefs about a place.
This project deals with the following task. An investor seeks the best location to
establish a successful new coffee shop a nd turns to the real estate company for
advice and consultation.
Excluding rent and related costs (see Box 1.4 ), the investor is primarily
interested in ﬁnding an appropriate neighborhood based on the following
objectives (see Table 1.2 ):
1. As the service provided would be of premium quality, the coffee shop
should be located in an area whose residents (i.e., potential clients/target
group) have high annual incomes .
2. The coffee shop should be located in an area of low crime .
3. The target group should be likely to spend more money than average for
expenses, including coffee-related products. High spenders should be
identi ﬁed.
4. The socioeconomic drivers behind people ’s monthly expenses (includ-
ing those for coffee-related services) should be identi ﬁed.
The fourth objective is not directly linked to ﬁnding an optimal location as
described in the ﬁrst three main objectives. It focuses on identifying the spatial
relationships that can be used for modeling, market penetration and clientele
analysis. Such an analysis should include detailed variables such as consump-
tion preferences, everyday habits, type of job and the amount of money spent
on coffee in coffee shops. For educational reasons and to keep the analysis
brief, we will focus on primary socioeconomic variables.
Box 1.4 A complete study would also include such factors as the location of
competitors, rent and relate d costs, access to public tran sport (i.e., subway), the
budget of the investment, the number of daily passersby, the size of theTable 1.2 Project objectives and methods per lab.
No Objectives Methods Lab
1 Location: High income Mapping, Spatial autocorrelation 2,4,5
2 Location: Low Crime Mapping, Centrographics,
Spatial autocorrelation3,4,5
3 Clustering: High spenders Clustering 5
4 Modeling: Identify drivers Regression, Spatial regression 2,6,[REDACTED PHONE] Think Spatially

--- Page 58 ---
Box 1.4 (cont. )
permanent population and the number of people working in nearby of ﬁces. It
would be infeasible to answer these questions within a book and this project
addresses only those dealing with space . The results can then be integrated into
a market analysis conducted by marketers o r business specialists to build a robust
business and spatial plan. Spatial planning is key for success, not only in business
but also in the implementation of national, regional and local policies on various
issues, such as education, health, labor, emergencies and public administration.
In our case study, although the four objectives/questions seem simple, the analy-
sis might prove endless (which is possible in spatial analysis). We will concentrate
on a relatively large set of important questions and provide advanced modeling
options. To address these questions, we will use many spatial analysis methods,
such as exploratory spatial data analysis, spatial autocorrelation, data clustering,
spatial clustering and spatial regression (see Tables 1.2 and1.3).
The spatial analysis will go through three steps –Describe, Explore and
Explain –while answering three basic questions: “What? ”“Where? ”and “How/
Why? ”(seeFigures 1.1 and1.7).
/C15 For the “What? ”set of questions, we will study the status of speci ﬁc
variables. For example, what is the mean income of the study area? What
is the population with incomes higher than a speci ﬁc value? Are there
income outliers? This provides an initial understanding of the socioeco-
nomic pro ﬁle of the study area. Combined with exploratory spatial data
analysis and related mapping techniques, it will offer a preliminary indi-
cation of how the variables are distributed in space.
/C15 Then, we will ask “Where? ”questions. Where are the areas with low/high
income? Are there spatial clusters of areas with high income values? Is there
a crime hot spot? These type of questions will provide a solid analysis based
on spatial statistics that quantify results as being signi ﬁcant or not and also
identify interesting spatial patterns that would not be detectable otherwise.
/C15 Finally, the analysis will delve deeper by answering “How/Why? ”ques-
tions. Several regression and econometric models will be created to
model monthly expenditures (independent) based on a set of dependent
variables (e.g., location, income).
Data
The study area is the city of Athens, Greece (referred to as the “city”hereafter).
The spatial data refer to the postcodes of the city (polygons; see Table 1.4 ). The
socioeconomic data refer to the[REDACTED PHONE] census (see Table 1.5 ). Some of the census
data are original, and some have been rescaled for reasons of con ﬁdentiality.
Tasks
The main analysis tasks are presented in Table 1.3 .41 Scope of Analysis

--- Page 59 ---
Table 1.3 Project tasks per lab following the describe –explore –explain work ﬂow.
Task Lab Tools to perform task Why performing task
What (describe)
Dataset and study area. Create and map ratios 1 Symbology De ﬁne the problem, the study area and the database. To
easily map ratios (e.g. population density).
Describe and map variables. Calculate and map
z-scores2 Choropleth maps
Histograms
Basic statistics
(Skewness, Kurtosis,
etc.)
Normal QQ plot
Boxplots
Z-score renderingTo locate areas of high or low income. To ﬁnd if
distributions are skewed. To identify if distributions follow
the normal distribution. To identify outliers.
Conduct correlation and pairwise correlation
analysis in the dataset variables2 Scatter plots
Scatter plot matrixTo identify if linear relationships exist among the
variables. This will make modeling easier in a later step.
Where (explore)
Analyze and measure the geographic
distribution of crime events in the study area[REDACTED ADDRESS]andard deviational
ellipseToﬁnd if directional or temporal trends in crime locations
exist. This will help assessing better the optimal location
of the coffee shop.
Point pattern analysis 3 Average nearest
neighbor
Ripley ’skTo identify if the spatial pattern of crimes is random,
dispersed or clustered. To ﬁnd at which distance
clustering or dispersion is more pronounced.
Create density maps 3 Kernel density
estimationTo create a smooth map covering the study region
depicting high or low densities in crime occurrences.
Locational outliers 3 Feature to point
NearIdentify if locational outliers exist and remove them to
calculate spatial statistics.
Identify if spatial autocorrelation of income
exists4 Spatial weights matrix
Global Moran ’sI
Incremental Spatial
Autocorrelation
Local Moran ’sI
Getis-Ord G*To conceptualize space by creating the spatial weights
matrix.
To identify if f high- or low-income values cluster in space.
To locate hot or cold spots. To identify the scale of the
analysis.4
2

--- Page 60 ---
Identify if spatial autocorrelation of crime events
exists4 Optimized hot spot
analysisTo identify in an optimized way if hot spots or cold spots
of crime events exist. Hot spots of crime should be
excluded from the potential locations for the coffee shop.
Multivariate data clustering 5 k-means clustering Conduct geodemographical analysis based on a variety of
socioeconomic variables.
Spatially constrained multivariate clustering 5 SCATTER Spatial clustering (regionalization)
Similarity analysis 5 Similarity search
(cosine similarity)To identify similar postcodes to a target one as
alternatively potential locations
Synthesis 5 Select by attributes/
location
Export
ReclassifyTo identify the best location based on the evaluation
criteria
How/Why (explain)
Modeling relationships 6 Exploratory regression
Ordinary least squares
Geographically
weighted regressionModel expenditures (independent) based on a set of
dependent variables to identify the factors that increase
expenditures
Modeling relationships 7 Spatial lag, spatial
error, spatial regimesModel expenditures (independent) based on a set of
dependent variables utilizing spatial econometrics.
Identify if expenditures are linked to spatial variables.4
3

--- Page 61 ---
Dataset Structure
The structure of the datasets and related ﬁles under the folder Booklabs are as
shown in Figure 1.8 (see also Box 1.5 ).
Box 1.5 Download Lab data from www.cambridge.org/[REDACTED PHONE] and
save them to I:\BookLabs\ (you can save data to other location if you
prefer like C:\) .
BackUpData folder stores the original data and serves as the backup of the data
stored in the folder Data. In case of corrupted, accidentally deleted or wrongly
edited data, you should copy the dataset from the BackUpData folder and paste it
into the Data folder. Each subsequent folder (e.g., Lab1 ) stores the .mxd ﬁles for each
speci ﬁcl a ba n dt h e Output andSolved folders. The Output folder is used to save the
output ﬁles of your analysis, like shape ﬁles, graphs, pdf or images. This is the main
folder of your data analysis. The Solved folder provides the solved exercise (e.g.,
Solved_Lab1_GettingToKnowDataSet.mxd) along with the ﬁnal dataset after any
tools applied and editing. Use this folder to compare with your results. An additional
folder, GeoDa , is used only in exercises solved with GeoDa software (see Section B).
The spatial data (stored into the Data folder) used in this book are described in
Table 1.4 .
The attribute data of the City shape ﬁle are described in Table 1.5 .
Figure 1.8 Dataset structure.
Table 1.4 Spatial data.
Files Depicting
City.shp 90 postcodes (polygons) consisting the case study area
Downtown.shp Outer polygon of the downtown area of the city
Assaults.shp Point events of assaults crime
Burglaries.shp Point events of burglaries crime
Crime.shp Point events of crime (both assaults and burglaries)44 Think Spatially

--- Page 62 ---
Guidelines
This font and the term “ACTION: ”indicate interactions with software.
Folders, variables and ﬁle names will be also written in this font.
The symbols used to explain actions are shown in Table 1.6 .
Section A ArcGIS
Exercise 1.[REDACTED ADDRESS]udy Region
This exercise describes how a population can be mapped and how popula-
tion density can be calculated and rendered.
ArcGIS Tip: All mxd ﬁles have been created using ArcGIS[REDACTED PHONE] ver-
sion. If the previous version is installed, open an empty mxd ﬁle and
insert the shape ﬁles of each exercise from the Data folder.Table 1.5 Socioeconomic data refer to the[REDACTED PHONE] census (rescaling has been applied for con ﬁdentiality).
These variables are the attribute ﬁelds of City.shp.
Attributes Description
Population Total population (persons)
Density Population density (persons per square meter)
Foreigners Population per cent (%) of foreigners (other than Greek
nationality)
Owners Population percentage (%) owing a house (not paying rent)
SecondaryE Population percentage (%) obtained secondary education
or less
University Population percentage (%) with bachelor degree
PhD_Master P opulation percentage (%) obtained master or higher degree
Income Average annual income per capita in euros
Insurance Average monthly insurance cost per capita (in euros)
Rent Average monthly rent (in euros)
Expenses Average monthly per capita expenses for daily purchases
(in euros –i.e., grocery, coffee)
Area Area of post code in square meters
Postcode Five-digit unique ID
Table 1.6 Basic symbols used in explaining interaction with software.
Symbols Meaning
> Next action
TOC Table of
contents
RC Right-click
DC Double-click
TAB = Select TAB
= Set value45 Exercise 1.[REDACTED ADDRESS]udy Region

--- Page 63 ---
Exercise 1.1 (cont. )
ArcGIS Tools to be used: Symbology, Zoom tools, Table of contents,
Attribute table, Normalization
ACTION: Open dataset and map population
Navigate to the location you have stored the book dataset and
double click Lab1_GettingToKnowDataSet.mxd
For example: I:\BookLabs\Lab1\Lab1_GettingToKnowDataSet.mxd
Tip: You can type this address directly to your windows explorer browser (just
change the name of the drive-letter; if you have stored in C change I to C).
First, save the original ﬁle with a new name:
Main Menu >File >Save As >My_Lab1_GettingToKnowDataSet
In I:\BookLabs\Lab1\Output
TOC (Table of contents) >RC (Right-click) the City layer >Open
Attribute Table (see Figure 1.9)
Figure 1.9 The case study area.46 Think Spatially

--- Page 64 ---
Exercise 1.1 (cont. )
Figure[REDACTED PHONE] A total of 90 postcodes with 17 ﬁelds are stored in the attribute table, of
which 11 are socioeconomic variables (see Table 1.5).
Figure[REDACTED PHONE] Layer properties dialog box for setting symbology.47 Exercise 1.[REDACTED ADDRESS]udy Region

--- Page 65 ---
Exercise 1.1 (cont. )
TOC >RC City >Properties >TAB = Symbology >Quantities >
Graduated colors >Value = Population (see Figure[REDACTED PHONE])
Color Ramp = Yellow to Brown
Click Classify
Enter the following values in the Break Values window at the lower right
(see Figure[REDACTED PHONE]):
Break Values >[REDACTED PHONE] >Enter >[REDACTED PHONE] >Enter >[REDACTED PHONE] >Enter >
[REDACTED PHONE] >Enter >[REDACTED PHONE] >Enter >OK
RC Label >Format Labels >Select Numeric >Rounding >Number
of decimal places = 2 >OK (see Figure[REDACTED PHONE])
Click Apply >OK
TOC >RC City >Save As Layer File >(see Figure[REDACTED PHONE])
Name = Population.lyr
In I:\BookLabs\Lab1\Output
Add the layer in the TOC.
Save
Figure[REDACTED PHONE] Setting categories range values.48 Think Spatially

--- Page 66 ---
Exercise 1.1 (cont. )
Figure[REDACTED PHONE] Deﬁning number format.
Figure[REDACTED PHONE] Population choropleth map.49 Exercise 1.[REDACTED ADDRESS]udy Region

--- Page 67 ---
Exercise 1.1 (cont. )
Interpreting results: The case study area is consisted of 90 postcodes (spatial
features; see Figure 1.9 ). By opening the attribute table of City, we inspect
the variables stored and the corresponding values for each spatial feature (see
Figure[REDACTED PHONE] ). Postcodes in downtown have a lower population than the
postcodes in the outskirts (see Figure[REDACTED PHONE] ). As the central postcodes are
smaller in size, it is advised to additionally depict population density, as it
provides a better mapping of population distribution within a study area.
ACTION: Calculate and map population density
RC the City layer (not the Population.lyr) >Properties >TAB =
Symbology >Quantities >Graduated colors
Value = Population
Normalization = Area
Color Ramp = Light Green to Dark Green
Classes = 4
Click Classify >Break Values >[REDACTED PHONE] >Enter >[REDACTED PHONE] >Enter
>[REDACTED PHONE] >Enter[REDACTED PHONE] >OK
Density and break values refer to population per square meter.
In practice, [REDACTED PHONE] means that[REDACTED PHONE] people live within 1m2or
1 person per 100m2.
RC Label >Format Labels >Numeric >Number of decimal places =
2>OK>Apply >OK
TOC >RC City >Save As Layer File >
Name = PopDensity.lyr
In I:\BookLabs\Lab1\Output
Add the layer in the TOC.
Main Menu >File >Save
Tip: Saving Population normalized by area into a layer ﬁle (.lyr ) saves the
density representation. When you add a layer in the table of contents, it is
given the name of the original shape ﬁle created ( City in this example) and
not the name it was saved (i.e., PopDensity.lyr; see Figure[REDACTED PHONE] ).
Interpreting results: Choropleth map of population density depicts smaller
densities (for most postcodes) in the city center (downtown –red polygon)50 Think Spatially

--- Page 68 ---
Exercise 1.1 (cont. )
that grow larger as we move outward (see Figure[REDACTED PHONE] ). We locate a cluster
of densely populated postcodes at the northern part of the city. On the
other hand, population density is lower in downtown area probably because
of its business and historic character (with fewer permanent residents).
Similarities with the population map (see Figure[REDACTED PHONE] ) can be identi ﬁed,
but overall, population density map offers a better insight on how popula-
tion is distributed across the postcodes. For example, in the downtown area,
postcodes are described in more detail with population density compared
to population.
ArcGIS tip: The normalization procedure in ArcGIS is used to divide one
variable with another. This offers the ability to calculate rates of change
(population increase), percentages (e.g., land cover share), per capita
Figure[REDACTED PHONE] Population density.51 Exercise 1.[REDACTED ADDRESS]udy Region

--- Page 69 ---
Exercise 1.1 (cont. )
numbers (e.g., income per capita) and de nsities (e.g., population density).
It should not be confused with the normalizing process that rescales data
to a range [0,1] or [ /C01,1] (see Section 2.4). The no rmalization tool in ArcGIS
is an adjustment that divides two variables. For example, if we have the
aggregated income of all people living in each postcode, we can calculate
and map the per capita income. A drawback of this tool is that we cannot
obtain the values of the population density in a new ﬁeld. We simply
map the results. We can easily though produce this ratio using ﬁeld calcula-
tor procedures. The normalization tool in ArcGIS is very useful when we
need to test for various combinations of ratios. Once we decide which of
the tested ratios to retain, we can calculate the values by using the ﬁeld
calculator.
Section B GeoDa
Box 1.6 Download and install GeoDa free and open-source software
through [REDACTED URL] . Browse also the documentation
section where you can ﬁnd a detailed workbook. GeoDa is developed by
Dr. Luc Anselin and his team and their contribution to the spatial analysis
ﬁeld is paramount.
Exercise 1.[REDACTED ADDRESS]udy Region
This exercise describes how a population can be mapped and how popula-
tion density can be calculated and rendered.
GeoDa Tools to be used: Category editor ,Zoom tools ,Table
ACTION: Open dataset and map population
Navigate to the location you have stored the book dataset and
click Lab1_GettingToKnowDataSet_GeoDa.gda inside the GeoDa
folder
For example:
I:\BookLabs\Lab1\GeoDa\Lab1_GettingToKnowDataSet_GeoDa.gda52 Think Spatially

--- Page 70 ---
Exercise 1.1 (cont. )
Tip: You can type this address directly to your windows explorer browser
(just change the name of the drive letter; if you have stored in C, change I to
C). Spatial data for GeoDa exercises are stored into GeoDa folder and not
intoData folder. For spatial data and attribute values see Tables 1.4 and 1.5.
Main Menu >Click the Table icon (see Figures[REDACTED PHONE] and[REDACTED PHONE]).
Click then on the Map-CityGeoDa window to activate it.
Main Menu >Map >Custom Breaks >Create New Custom Breaks >
(see Figure[REDACTED PHONE])
On the Variable Settings window ( see Figure[REDACTED PHONE] ) select:
Population >OK
On the window ‘New Custom Categories Title ’type: Custom Breaks
(Population) and click OK
Figure[REDACTED PHONE] The case study area in GeoDa.53 Exercise 1.[REDACTED ADDRESS]udy Region

--- Page 71 ---
Exercise 1.1 (cont. )
Figure[REDACTED PHONE] A total of 90 postcodes with 15 ﬁelds are stored in the attribute table of which 11 are socioeconomic
variables (see Table [REDACTED PHONE] Think Spatially

--- Page 72 ---
Exercise 1.1 (cont. )
Figure[REDACTED PHONE] Creating a choropleth population map
Figure[REDACTED PHONE] Variables selection dialog box.55 Exercise 1.[REDACTED ADDRESS]udy Region

--- Page 73 ---
Exercise 1.1 (cont. )
On the Category Editor, change only the following fields:Breaks
= User Defined (see Figure[REDACTED PHONE])
Categories = 5
Type the following values directly in the break fields: break 1
= [REDACTED PHONE] / break 2 = [REDACTED PHONE] / break 3 = [REDACTED PHONE] / break 4 = [REDACTED PHONE] /
break 5 = [REDACTED PHONE] >Close the dialog box
The map is updated (see Figure[REDACTED PHONE])
Main Menu: Save
Figure[REDACTED PHONE] Category editor dialog box.56 Think Spatially

--- Page 74 ---
Exercise 1.1 (cont. )
Interpreting results: See Section A.
Tip:The interpretation of the results as well as the conclusions related to the
analysis are not repeated here, as they are already presented in section A in
the “Interpreting results ”paragraphs of each exercise. The reader should
study these sections carefully as they are independent of the software used.
ACTION: Calculate and map population density
Main Menu >Options >Rates >Raw Rate
Event Variable = Population (see Figure[REDACTED PHONE])
Base Variable = Area
Map Themes = Natural Breaks
Categories = 4 >OK (see Figure[REDACTED PHONE])
Save
Figure[REDACTED PHONE] Population choropleth map.57 Exercise 1.[REDACTED ADDRESS]udy Region

--- Page 75 ---
Exercise 1.1 (cont. )
Interpreting results: See Section A.
Figure[REDACTED PHONE] Setting the population density.
Figure[REDACTED PHONE] Population density map.58 Think Spatially

--- Page 76 ---
[REDACTED ADDRESS]atistics
THEORY
Learning Objectives
This chapter deals with
/C15 The notion of exploratory spatial data analysis
/C15 The presentation of descriptive statistics
/C15 Spatial statistics and their importance in analyzing spatial data
/C15 Analyzing univariate data
/C15 Simple exploratory spatial data analysis tools such as histograms, boxplots
and other visual methods utilized for deeper insight of spatial datasets
/C15 Bivariate analysis
/C15 Correlation and pairwise correlation
/C15 Normalization, rescaling and adjustments
/C15 Introducing basic notions of statistical signi ﬁcant tests
/C15 The importance of hypothesis setting in a spatial context
/C15 The importance of normal distribution in classic statistics and how it is
integrated into spatial analysis
After a thorough study of the theory and lab sections, you will be
able to
/C15 Have a solid knowledge of descriptive statistics
/C15 Use descriptive statistics for univariate analysis
/C15 Understand and use exploratory spatial data analysis techniques to map
and analyze variables attached to spatial objects
/C15 Create plots, link them to maps and identify interesting data patterns
/C15 Conduct bivariate analysis and identify whether two variables are linearly
related; use plots to further examine their relation
/C15 Rescale data to make comparisons between variables easier and also
allow for better data handling
/C15 Apply ESDA tools through ArcGIS and GeoDa
59

--- Page 77 ---
2.1 Introduction in Exploratory Spatial Data Analysis, Descriptive Statistics,
Inferential Statistics and Spatial Statistics
Deﬁnitions
Exploratory Spatial Data Analysis (ESDA) is a collection of visual and numer-
ical methods used to analyze spatial data by
(a) Applying classical nonspatial descriptive statistics that are dynamically
linked to GIS maps and spatial objects
(b) Identifying spatial interactions, relationships and patterns, through the
use of a spatial weights matrix (de ﬁned by the appropriate conceptual-
ization method), hypothesis testing and various metrics
ESDA methods and tools are used to
/C15 Describe and summarize spatial data distributions
/C15 Visualize spatial distributions
/C15 Examine spatial autocorrelation (i.e., trace spatial relationships and
associations)
/C15 Detect spatial outliers
/C15 Locate clusters
/C15 Identify hot or cold spots
Descriptive statistics is a set of statistical procedures that summarize
the essential characteristics of a distribution through calculating/plotting:
/C15 Frequency distribution
/C15 Center, spread and shape (mean, median and standard deviation)
/C15 Standard error
/C15 Percentiles and quartiles
/C15 Outliers
/C15 Boxplot graph
/C15 Normal QQ plot
Inferential statistics is the branch of statistics that analyzes samples to draw
conclusions for an entire population.
Spatial statistics employ statistical methods to analyze spatial data, quantify a
spatial process, discover hidden patterns or unexpected trends and model
these data in a geographic context. Spatial statistics are largely based on
inferential statistics and hypothesis testing to analyze map patterns so that
spatially varying phenomena can be better modeled (Fischer & Getis[REDACTED PHONE]
p. 4). Unlike nonspatial methods, spatial statistics use spatial properties such
as location, distance, area, length and proximity directly in their mathematical
formulas (Scott & Janikas[REDACTED PHONE] p. 27). Spatial statistics quantify and further map
what the human eye and mind intuitively see and do when reading a map that
depicts spatial arrangements, distributions, processes or trends (Scott & Jani-
kas2010 p. [REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page 78 ---
Why Use Descriptive Statistics and ESDA
Describing a dataset is usually the ﬁrst task in any analysis. This quickly provides
an understanding of data variability and allows for the identi ﬁcation of possible
errors (e.g., a value that is not acceptable, omissions [blank cells] or outliers
[scores that differ excessively from the majority]). To describe a dataset, we use
descriptive statistics (also called “summary statistics ”). Typical questions that
descriptive statistics may address in a geographic context include the
following: What is the average income in a neighborhood? What is the per-
centage of people having graduated from the university in a postcode? How
many customers of a speci ﬁc coffee shop live within a distance of less than
10 minutes walking time? What is their purchasing power, and what is the
standard deviation of their income?
Descriptive statistics are useful for calculating speci ﬁc characteristics (e.g.,
average or standard deviation), thus providing insights for data distributions.
However, they do not provide linkages among the results and the spatial
objects arranged in a map. The main characteristic of ESDA tools is that they
are dynamically linked to maps in a GIS environment. For example, when a
point in a scatter plot is brushed (selected), a spatial object is also highlighted
on the corresponding map. Likewise, brushing spatial objects on the map, the
relevant points/areas/bars are highlighted in the graphs. The basis of explora-
tory spatial data analysis is the notion of spatial autocorrelation (see Chapter 4 ),
whereby spatial objects that are closer tend to have similar values (in one or
more attributes). As such, ESDA offers a more sophisticated analysis, as it
discovers patterns in data through mapping and statistical hypothesis testing
(see Chapter 3 ).
ESDA ’s strength rests on two major features (Dall ’erba[REDACTED PHONE] ; Haining et al.
[REDACTED PHONE] p. [REDACTED PHONE]):
/C15 ESDA extracts knowledge based on its data-mining capacity, as the
information that the attribute values carry is relevant to the location of
data. This is extremely useful when no prior theoretical framework
exists –for example, in many interdisciplinary social science ﬁelds.
/C15 ESDA utilizes a wide range of graphical methods combined with map-
ping, making the analysis more accessible to people who are not accus-
tomed to model building.
Descriptive statistics are used in conjunction with ESDA tools. Sometimes, the
boundaries between them are unclear, at least for simple tools. For this reason,
many books include histograms, scatter plots or boxplots in descriptive statis-
tics and others in ESDA. The distinction is not of major importance as long as
one understands how each tool works. In essence, the only difference with
simple ESDA tools (e.g., histograms, scatter plots, boxplots) is that they offer
the ability to link graphs to spatial objects, which enhances their power when
used in research analysis (Fischer & Getis[REDACTED PHONE] p. 3). In this book, simple tools
such as histograms, scatter plots and boxplots are presented from the spatial61 2.1 De ﬁnitions
                    

--- Page 79 ---
analysis perspective and are linked to GIS maps. More advanced ESDA topics
that focus on both spatial and attribute association (e.g., point patterns analy-
sis, spatial autocorrelation) are presented in Chapters 3 and4. Broadly, simple
ESDA tools can be used prior to the modeling phase, and advanced ESDA
tools can act as model builders to identify spatial relationships and hidden
patterns in spatial data (Fischer & Getis[REDACTED PHONE] p. 3).
Why Use Spatial Statistics
Spatial statistics can be considered part of various spatial
analysis methods such as ESDA, spatial point pattern analysis, spatial
clustering and spatial econometrics. S patial statistics are mainly used to
/C15 Analyze geographic distributions through centrographic measures (see
Chapter 3 ). In a way similar to descriptive statistics, geographic distri-
butions can be measured to analyze their mean center and standard
distance. Spatial statistics are calculated based on the location of each
feature; this is a major difference from their homologous descriptive
statistics, which refer solely to the nonspatial attributes of the spatial
features. Although spatial statistics related to measuring geographic dis-
tributions can be weighted using an attribute value, the results refer to a
spatial dimension. The spatial features used are typically points and
polygons (centroids).
/C15 Analyze spatial patterns. Spatial statistics can be used to analyze the
pattern of a spatial arrangement. When this arrangement refers to point
features, then the analysis is called point pattern analysis. Through such
analysis, we determine whether a point pattern is random, clustered or
dispersed ( Chapter 3 ). The analysis of the spatial pattern that the attri-
bute values (of spatial features) form in space is part of the spatial
autocorrelation analysis examined in Chapter 4 .
/C15 Identify spatial autocorrelation, hot spots and outliers (see Chapter 4 ).
/C15 Perform spatial clustering (see Chapter 5 ).
/C15 Model spatial relationships. Spatial statistics can also be used to identify
the associations and relationships between attributes and space;
examples include spatial regression methods and spatial econometric
models (analyzed in Chapters 6 and7).
/C15 Analyze spatially continuous variables such as temperature, pollution,
soils, etc. In general, the type of spatial statistical analysis dealing with
continuous ﬁeld variables is named “geostatistics ”(O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). Geostatistics focus on the description of the spatial vari-
ation in a set of observed values and on their prediction at unsampled
locations (Sankey et al. [REDACTED PHONE] p. [REDACTED PHONE]).
Spatial statistics are built upon statistical concepts, but they incorporate loca-
tion in terms of geographic coordinates, distance and area. They extend classicstatistical measures and procedures and offer advanced insights in analyses of[REDACTED ADDRESS]atistics
                    

--- Page 80 ---
data. In geographical analysis, spatial statistics are not used separately from
statistics but are complementary. However, there is a fundamental difference
between classical and spatial statistics. In classical statistics, we make a basic
assumption regarding the sample: it is a collection of independent observa-
tions that follow a speci ﬁc, usually normal, distribution. Contrariwise, in spatial
statistics, because of the inherent spatial dependence and the fact that spatial
autocorrelation exists (usually), the focus is on adopting techniques for
detecting and describing these correlations. In other words, in classical statis-
tics, observation independence should exist while, in spatial statistics, spatial
dependence usually exists. Classical statistics should be modi ﬁed accordingly
to adapt to this condition.
2.[REDACTED ADDRESS]atistics for Visualizing Spatial
Data (Univariate Data)
This section presents the most common ESDA techniques and descriptive
statistics for analyzing univariate data (only one variable of the dataset is
analyzed each time; bivariate data analysis is examined in the next section).
These include
/C15 Choropleth maps
/C15 Frequency distributions and histograms
/C15 Measures of the center, spread and shape of a distribution
/C15 Percentiles and quartiles
/C15 Outlier detection
/C15 Boxplots
/C15 Normal QQ plot[REDACTED PHONE] Choropleth Maps
Deﬁnition
Choropleth maps are thematic maps in which areas are rendered according to the
values of the variable displayed (Longley et al. [REDACTED PHONE] p. [REDACTED PHONE]).
Why Use
Cloropleth maps are used to obtain a graphical perspective of the spatial
distribution of the values of a speci ﬁc variable across the study area.
Interpretation
The ﬁrst task when spatial data are joined to nonspatial data (i.e., attributes
from a census) is to map them, creating “choropleth maps. ”For example,
population, population density and income per capita can be rendered in a
choropleth map. There are two main categories of variables displayed in
choropleth maps: (a) spatially extensive variables and (b) spatially intensive63 2.2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 81 ---
variables. In spatially extensive variables, each polygon is rendered based on a
measured value that holds for the entire polygon –for example, total popula-
tion, total households or total number of children. In the spatially intensive
category, the values of the variable are adjusted for the area or some other
variable. For example, population density, income per capita and rate of
unemployment are spatially intensive variables because they take the form of a
density, ratio or proportion. Some argue that the ﬁrst category is not always
appropriate and that variables should be mapped using the second category
(Longley et al. [REDACTED PHONE] p. [REDACTED PHONE]) because, as each polygon has a different size,
mapping real values directly might be misleading. For example, a very small
and very large polygon with identical populations will be rendered in the same
color if we map population in absolute values. However, adjusting population to
polygon area, thus depicting population density, would lead to rendering these
two polygons in different colors, as their density values are very different. Thus,
the type of variable used to create a choropleth map clearly depends on the
problem and on the message one wants to communicate through the mapping.
Through choropleth maps, we visually locate where values cluster or whether
they exhibit similar spatial patterns. We may describe such formations using
expressions such as “In the western part of the study area, variable Xhas low
scores, while, in the northern part, scores are higher, ”or“High scores of
variable Xare clustered in the city center. ”This is a descriptive way of reading
a map and the related symbology. There are no statistics yet, but it communi-
cates a great deal. It may even be better than many statistical analyses, since
maps often speak for themselves, provided that the maps and symbols are
accurate. Nevertheless, scienti ﬁc analysis must always be accompanied by
statistical analysis in order to prove the ﬁndings in a statistically sound way.
The next step in mapping variables through a choropleth map is to apply
descriptive statistics to summarize the data and use exploratory spatial data
analysis methods to visualize the values associated with locations.
Discussion and Practical Guidelines
One might present ﬁndings through maps and graphs in an inappropriate way
and give the wrong impression or even a misleading message (Tufte[REDACTED PHONE] ).
Sometimes this happens through ignorance, and sometimes it is done deliber-
ately to mislead. For example, the choice of colors, scale, map projection or
even map title might be misleading (see Box 2.1 ). As professionals, we have to
create accurate maps and graphs and always use solid statistics to back up our
ﬁndings.
Box 2.1 The Mercator projection was invented by Mercator in[REDACTED PHONE] to
help explorers navigate to the sea. Since the earth ’s shape is approximated[REDACTED ADDRESS]atistics
                    

--- Page 82 ---
Box 2.1 (cont. )
by an ellipsoid, every two-dimensional map induces distortions in either
area, length or angle direction. Mercator created a projection that keeps
the angle right whether in two-dimensional or real-earth terms. By drawing a
line on this map between two points and calculating the angle from north,
ships could go directly to the destination with no divergence. However, this
projection does not preserve area. In this projection, Brazil seems to have
the same area as Alaska. In fact, Brazil is ﬁve times the size of Alaska. This
does not mean that the map is wrong. It is just used wrongly, as its purpose
is to map angles correctly. To compare areas, we have to use other projec-
tions. To avoid misleading interpretations, a map should be used for the
purposes it was created for.
[REDACTED PHONE] Frequency Distribution and Histograms
Deﬁnitions
Frequency distribution table is a table that stores the categories (also called
“bins ”), the frequency, the relative frequency and the cumulative relative
frequency of a single continuous interval variable (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]; see
Table 2.1 ).
Thefrequency for a particular category or value (also called “observation ”)
of a variable is the number of times the category or the value appears in the
dataset.
Relative frequency is the proportion (%) of the observations that belong to a
category. It is used to understand how a sample or population is distributed
across bins (calculated as relative frequency =frequency /n)
Table 2.1 Frequency distribution table. Example for n= 15 postcodes and their population. Five
(frequency) postcodes have population between[REDACTED PHONE] and[REDACTED PHONE] (bin) people, which is[REDACTED PHONE]% (relative
frequency = 5/15) of the total postcodes. Overall, [REDACTED PHONE]% (cumulative relative frequency = [REDACTED PHONE]% +
[REDACTED PHONE]% + [REDACTED PHONE]%) of the postcodes have a population of at least[REDACTED PHONE] people.
Population
range/bins Frequency Relative frequency %Cumulative relative
frequency %
[REDACTED PHONE]–[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]–[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]–[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]–[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] –[REDACTED PHONE][REDACTED PHONE].00
n= [REDACTED PHONE] 2.2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 83 ---
Thecumulative relative frequency of each row is the addition of the relative
frequency of this row and above. It tells us what percent of a population
(observations) ranges up to this bin. The ﬁnal row should be[REDACTED PHONE]%.
A frequency distribution histogram is a histogram that presents in the x-
axis the bins and in the y-axis the frequencies (or the relative frequencies) of a
single continuous interval variable (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]; see Figure 2.1 ).
Aprobability density histogram is de ﬁned so that
(a) The area of each box equals the relative frequency (probability) of the
corresponding bin
(b) The total area of the histogram equals 1
Why Use
Frequency distribution tables and histograms are used to analyze how the
values of the studied variable are distributed across the various categories.
The histogram can also be used to determine if the distribution is normal or
not. Additionally, it can be used to display the shape of a distribution and
examine the distribution ’s statistical properties (e.g., mean value, skewness,
kurtosis). Interesting questions can then be answered that may assist spatial
analysis or spatial planning (e.g., about how many postcodes have a population
of less than a speci ﬁc value; see Table 2.1 ). Histograms should not be confused
with bar charts, which are used mainly to display nominal or ordinal data (de
Vaus[REDACTED PHONE] p. [REDACTED PHONE]).
Figure 2.1 Frequency distribution histogram for the population variable in Table 2.1. Each
bar depicts the number of postcodes (frequencies in the y-axis) for each population bin
(in the x-axis).[REDACTED ADDRESS]atistics
                    

--- Page 84 ---
Discussion and Practical Guidelines
To calculate frequencies, we ﬁrst set the number of bins in which the values will
be grouped by dividing the entire range of values into sequential intervals
(bins). The choice of the appropriate number of bins as well as their range
depends on the project at hand and the scope of the analysis; it should be
meaningful. A trial-and-test method is an appropriate approach for choosing
how many bins to use. Using many bins is suitable if there is high variance and a
large population, but using too many bins makes interpretation dif ﬁcult. On the
other hand, a relatively small number of bins might conceal data variability (for
a small number of bins, we can use other graphs, such as pie graphs). As a rule
of thumb, a value that falls on the boundary of two bins should be placed on
the upper bin. For example, if the intervals for the variable “age”are set for
every ﬁve years (e.g., 0 –5, 5–10 and so on), then a ﬁve-year-old child should be
grouped with the 5 –10 bin. More mathematically, bins should be set as 0 to <
5, 5 to <10 and so on.
After de ﬁning the bins and ranges, we count how many values (observations)
lie within each bin. This count is the frequency. If we add all frequencies, we
should obtain a total that equals the sample ( n). A frequency distribution table
also includes the relative frequency (percentage) and the cumulative relative
frequency (cumulative percentage; see Table 2.1 ). All relative frequencies have
to add up to[REDACTED PHONE]%. For large frequencies, the plots are not well presented; some
bins might be much larger than others. By using relative frequency, we change
our scale on the y-axis to 0 –1( 0 % –[REDACTED PHONE]%), but all essential characteristics of the
frequency distribution –such as location, spread and shape –are unchanged (see
next section; Peck et al. [REDACTED PHONE] p. 28). Likewise, the ﬁnal cumulative relative
frequency should be[REDACTED PHONE]%.
Based on the frequency distribution table, a frequency distribution histo-
gram can also be plotted (see Figure 2.1 ). In this histogram, each frequency is
centered over the corresponding bin and is represented by a rectangle. For
same-width bins, the area of the rectangle is proportional to the corresponding
frequency. Histograms can also be created for the relative frequency. Each bar
is centered on the same values in the x-axis as in the frequency distribution
histogram, and the height equals the relative frequency, with the sum of the
heights of all bins equaling 1. The relative frequency can be considered as the
probability of a value occurrence.
Another option is to normalize (divide) the relative frequency by the width
o fe a c hb i n .I nt h i sc a s e ,w eh a v eo nt h e y-axis the relative frequency per unit
o ft h ev a r i a b l eo nt h e x-axis. This type of histogram is called probability
density histogram. In this case, the bins are not of the same width. In fact,
the area of each bin equals the probability of occurrence of a speci ﬁc value or
range of values and the area of all bins should equal 1. As the sample size
increases, creating more bins for the same range of values, the densityhistogram can be ﬁtted by a continuous function called a “probability density
function ”(PDF) (see Figure 2.2 ). The PDF is used to ﬁnd the probability that a67 2.2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 85 ---
value Xfalls within some interval. A PDF is widely used in statistics and spatial
statistics. Most of the times the units on the x- a x i si st h es t a n d a r dd e v i a t i o no f
the variable.
For a normal distribution, the normal PDF is (Illian et al. [REDACTED PHONE] p. 53, see
Figure 2.2 )
fxjμ;σðÞ ¼1
σﬃﬃﬃﬃﬃ
2πp e/C01
2x/C0μ
σðÞ2
(2.1)
where
μis the mean of the population
σis the population standard deviation
σ2is the population variance
xis the value of the variable
The probability distribution function of normal distribution takes two
parameters: the population mean and the population standard deviation. In
such a distribution, we expect that 68% of the values lie one standard deviation
from the mean and that 95% of the values lie inside two standard deviationsfrom the mean (in both directions; see Figure 2.2 ). The area between the
intervals and the curve equals the probability by which we anticipate a value
Figure 2.2 Standard deviation of a normal distribution; 68% of objects lie within one
standard deviation from the mean (34% in each direction).[REDACTED ADDRESS]atistics
                    

--- Page 86 ---
to appear in our dataset. For example, the probability that a value will range
between +1 standard deviation and +2 standard deviations in a normal distri-
bution is[REDACTED PHONE]% (the shaded area on the right). Values larger than two standard
deviations from the mean are expected at less than 2.5% (right tail). In other
words, the probability of obtaining a value larger than two standard deviations
from the mean is less than 2.5%. Likewise, values smaller than two standard
deviations from the mean, are expected at less than 2.5%. Some of the most
commonly used signi ﬁcance tests in spatial statistics (as explained later) make
use of the normal PDF to test if a hypothesis is true by calculating the probabil-
ity under a speci ﬁc interval (signi ﬁcance level).
Normal distribution (also called Gaussian distribution) is the most commonly
used distribution in statistics, as many physical phenomena are normally dis-
tributed (e.g., human weight and height). In a normal distribution, the values of
a variable are more likely to be closer to the mean, while larger or smaller
scores have low probabilities of occurring. Normal distribution is used in many
statistical tests to draw conclusions regarding the distribution studied. It has a
zero mean, one unit standard deviation, symmetrical histogram and a bell-
shaped shape (see Figure 2.3A ). Not all bell-shaped histograms reveal a normal
distribution, as a normal distribution decreases from the top to the tails in a
certain way (see Figure 2.3G ; for more on this, see Section 2.6 ).
In general, any distribution can be described by three important essential
features: center, spread and shape. Analyzing these features provides infor-
mation for (a) center, (b) extent, (c) general shape, (d) location and number of
peaks and (e) the presence of gaps and outliers, as discussed next.
[REDACTED PHONE] Measures of Center
Deﬁnitions
Measures of central tendency provide information about where the center of
a distribution is located. The most commonly used measures of center for
numerical data are the mean and the median (mode is another measure of
center and is the value that occurs most often in a sample).
The mean is the simple arithmetic average: the sum of the values of a
variable divided by the number of observations (calculated for interval data),
as in (2.2):
/C22x¼Pn
i¼1xi
n(2.2)
where
nis the total number of observations
xiis the score of the i thobservation
Σis the symbol of summation (pronounced sigma)
/C22xis the sample mean value69 2.2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 87 ---
Themedian is the value that divides the sorted scores form smaller to larger in
half. It is a measure of center.
Why Use
The mean is used to describe the center of a distribution, while the median
is used to split the frequency distribution histogram into two equal
area parts.
Interpretation
If we list scores from smallest to largest, the middle score is the median. It cutsscores in two equal parts. Fifty percent of the objects have values larger than
the median, and 50% of the objects have values less than the median. In
Figure 2.3 (A),(B),(C) Symmetric histograms. (D) Positively skewed. (E) Negatively skewed.
(F) Different types of bell shaped distributions. (G) Curves with various kurtosis values.[REDACTED ADDRESS]atistics
                    

--- Page 88 ---
addition, the median splits a frequency distribution histogram in two equal area
parts, while the mean is the balance point of the distribution histogram (the
sum of the values at the left of the mean equals the sum of the values at
the right).
Discussion and Practical Guidelines
We should be cautious when we interpret the mean because (a) the same mean
might be the result of completely different distributions and (b) extreme values
or outliers change the mean value and in ﬂate the skewness of a distribution
signi ﬁcantly. When we exclude outliers or extreme values, the mean is likely to
change signi ﬁcantly. In a normal distribution, the mean is zero (calculated for
standard deviations) and is located in the center of the distribution (see
Figure 2.3A ). The median overcomes the outlier problem, as it is based on
ranked positions and not on real values. When nis odd, there is a single
median. When nis even, there are two “middle values, ”and we take their
average to obtain the median. The median is located in the center of a normal
distribution and coincides with the mean (see Figure 2.3A ). In other types of
distributions, the median tends to deviate from the mean. Typically, but not
always (depending on the values), the median lies at the left of the mean in a
negatively skewed distribution (see Figure 2.3E ) and at its right in a positively
skewed one. The median can be calculated for both ordinal and interval data.
[REDACTED PHONE] Measures of Shape
Deﬁnitions
Measures of shape describe how values (e.g., frequencies) are dis-
tributed across the intervals (bins) and are measured by skewness and
kurtosis.
Shape of the distribution is the curved line (sometimes a straight line) that
approximates the middle top of each bin in a continuous way. The x-axis closes
the shape, and the area can be calculated. If a shape is symmetrical around a
vertical line, the part of the histogram to the right is the mirror of the left part
(see Figure 2.3A –C).
Skewness is the measure of the asymmetry of a distribution around
the mean.
Kurtosis, from the graphical inspection perspective, is the degree of the
peakedness or ﬂatness of a distribution.
Why Use
Skewness is used to identify how values are distributed around the mean,
while kurtosis reveals how stretched a distribution is on the y-axis compared
to the normal distribution (see Figure 2.3G ). Peakedness and ﬂatness are
actually based on the size of the tails of a distribution. For this reason,[REDACTED PHONE] Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 89 ---
kurtosis is prone to outliers, as outliers tend to stretch the distribution
tails and signi ﬁcantly change the mean. Thus, the upper hill of a curve
may move upward or downward according to the strength and location of
an outlier.
Interpretation
If a histogram is not symmetric, it is skewed. If the right tail (part) of the
histogram tends to stretch considerably more than the left tail, this histogram
is named “positively skewed ”or“right skewed ”(see Figure 2.3D ). If the left tail
is stretched, we call the histogram “negatively skewed ”or“left skewed ”(see
Figure 2.3E ). Skewness of greater than 1 or less than /C01 typically indicates a
nonsymmetrical distribution (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]). Values higher than 1.5 or
less than /C01.5 indicate large skewness, meaning that the data tend to stretch
away from the mean in some direction. For example, suppose that Figure 2.3D
depicts the frequency distribution ( y-axis) of annual per capita income ( x-axis).
In this case, the distribution of income is positively skewed. Only a few people
have very high income (lie in the right tail further from the mean) while the
majority has income less than the mean (which is left of the median). Income is
unequally distributed: More have less and less have more.
A zero kurtosis indicates a near-normal distribution peakedness. A negative
kurtosis indicates a more ﬂat distribution (lower than normal), while a positive
kurtosis reveals a distribution with a higher peak than the normal distribution.
Strictly speaking, the kurtosis for a normal distribution is 3 (de Vaus[REDACTED PHONE]
p. [REDACTED PHONE]). Most statistical software subtract 3 from the ﬁnalﬁgure to adjust to
the zero de ﬁnition. This provides a quicker und erstanding, as the positive or
negative values are directly interpreted as distributions over or under a
normal distribution. Some popular software, such as Matlab and ArcGIS, do
not follow the zero de ﬁnition and regard a kurtosis of 3 as the normal
distribution[REDACTED PHONE] Measures of Spread/Variability –Variation
Deﬁnitions
Measures of spread (also called measures of variability, variation, diversity or
dispersion) of a dataset provide information of how much the values of a
variable differ among themselves and in relation to the mean. The most
common measures are as follows (de Smith[REDACTED PHONE] p. [REDACTED PHONE]):
/C15 Range (Peck et al. [REDACTED PHONE] p. [REDACTED PHONE])
/C15 Deviation from the mean
/C15 Variance
/C15 Standard deviation
/C15 Standard distance (see Section[REDACTED PHONE] )
/C15 Percentiles and quartiles (see Section[REDACTED PHONE] )[REDACTED ADDRESS]atistics
                    

--- Page 90 ---
Arange is the difference between the largest and smallest values of the
variable studied, as in (2.3):
Range =xmax/C0xmin (2.3) (2.3)
where xmaxis the maximum value of a variable, and xminis the minimum value of
the same variable (see Figure 2.3E ).
Deviation from the mean is the subtraction of the mean from each score, as
in (2.4):
Deviation ¼xi/C0/C22x ðÞ (2.4)
where
xiis the score of the ith object
/C22xis the sample mean value
The sum of all deviations is zero (sometimes, due to rounding up, the sum is
very close to zero), as in (2.5):
Xn
i¼1xi/C0/C22x ðÞ ¼ 0 (2.5)
Sample Variance is the sum of the squared deviations from the mean divided
byn/C01 (sample variance) as in (2.6) (see Table 2.2 ):
s2¼Pn
i¼1xi/C0/C22x ðÞ2
n/C01sample varianceðÞ (2.6)
Squared values are used to turn negative deviations to positive. To calculate
the variance for the entire population, denoted by σ, we simply divide by n,a s
in 2.7:
σ2¼Pn
i¼1xi/C0/C22x ðÞ2
npopulation varianceðÞ (2.7)
Standard deviation is the square root of variance ( 2.8,2.9)( s e e Table 2.2 ).
s¼ﬃﬃﬃﬃ
s2p
sample standard deviationðÞ (2.8)
σ¼ﬃﬃﬃﬃﬃ
σ2p
population standard deviationðÞ (2.9)
Table 2.2 Sample and population statistical symbols. Sample statistics are denoted by Latin letters and
population parameters by Greek letters.
Measure Sample statistic symbol Population parameter symbol
Mean /C22xpronounced: ex bar μ: pronounced: mu (miu)
Variance s2pronounced: es squared σ2pronounced: sigma squared
Standard deviation sxpronounced: es of ex σpronounced: sigma73 2.2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 91 ---
Why Use
Range is used to assess the variation of values in a variable, while deviation from
the mean is used to calculate how far away a score lies from the mean. Variance
is used to measure the spread of values in a variable. Standard deviation
indicates the size of a typical deviation from the mean. In essence, variance
and standard deviation re ﬂect the average distance of the observations from the
mean (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]). Standard deviation is easier to interpret than
variance, as it is measured at the same unit of the variable studied.
Interpretation
The greater the range, the more variation in the variable ’s values, which might
also reveal potential outliers. Large values of s2(variance) reveal a great
variation in the data, indicating that many observations have scores further
away from the mean. If the variation is large, we may cut off the top and bottom
5% or 10% of the dataset to produce a more compact distribution. This
typically happens in satellite image analysis for color enhancement.
A positive standard deviation value indicates the number of standard devi-
ations above the mean, and a negative value indicates the number of standard
deviations below the mean. Standard deviation is used to estimate how many
objects in the sample lie further away from the mean in reference to the z-score
(e.g., 1 or 2) (see Section[REDACTED PHONE] ). In any normal distribution and for a speci ﬁc
variable:
/C15 Approximately 68% of all values fall within one standard deviation of the
mean (z-score = 1).
[(/C22x/C01∗standard deviation) up to ( /C22xþ1∗standard deviation)]
/C15 Approximately 95% of all values fall within two standard deviations of the
mean (z-score = 2).
[(/C22x/C02∗standard deviation) up to ( /C22xþ2∗standard deviation)]
/C15 Nearly all values fall within three standard deviations of the mean
(z-score = 3).
[(/C22x/C03∗standard deviation) up to ( /C22xþ3∗standard deviation)]
Discussion and Practical Guidelines
It is important to note that variation is not the same as variance (a synonym for
variation is variability). Variation and variability are not some speci ﬁc quantities.
They are typically used as general terms expressing ﬂuctuations in values.
These ﬂuctuations are calculated through the measures of spread.
Sample variance is the total amount of the squared deviation from the mean
divided by the sample ( n), but not exactly; it is divided by the sample ( n) minus 1.
Why minus 1? If it was just n, it would be the average of the total amount of the
squared deviation, which makes more sense. In fact, minus 1 is necessary. It has
been observed that variation tends to be underestimated when we use
samples. Overestimating variance is better than underestimating it (Linneman[REDACTED PHONE] p. 90). In more advanced statistics, the term n/C01in this formula reveals[REDACTED ADDRESS]atistics
                    

--- Page 92 ---
the degrees of freedom ( df). Degrees of freedom generally equal the sample
(n) minus the number of parameters estimated. It is actually the number of
objects (of the sample) that are free to vary when estimating statistical
parameters. “Free to vary ”means that these objects have the freedom to take
any value (inside the set in which the function is de ﬁned), while others are
constrained by restrictions. If we are interested in the standard deviation for
the entire population, denoted by σ, we simply divide by n(2.7).
Selecting between the sample statistic and the population parameter for-
mula (see Table 2.2 ) depends on the nature of our analysis and the available
data. Suppose we want to calculate the standard deviation of income for a
speci ﬁc city. If we have data for the entire population of this city (through
census), we should apply the population standard deviation formula. The results
are not an estimate but a real population calculation. If we want to estimate the
standard deviation of income for the entire country, based only on this city
sample (we infer from the city to the country), then we should apply the sample
standard deviation formula (see more on inferential statistics in Section 2.5 ).
Finally, by combining the standard deviation and z-scores (see Section[REDACTED PHONE] ),
we can describe how objects (and their values) lie within a distribution. For
example, suppose that the mean value of incomes in 30 postcodes of a city is
15,[REDACTED PHONE] US dollars and the standard deviation is 2,[REDACTED PHONE] US dollars. The standard
deviation of 2,[REDACTED PHONE] US dollars means that, on average, incomes vary away (in
both directions) from the mean by 2,[REDACTED PHONE] US dollars. If the distribution of
income follows a normal distribution (in practice, it does not), approximately
68% of the postcodes (nearly 20) would have incomes in the range of [ 15,[REDACTED PHONE] –
1*standard deviation up to 15,[REDACTED PHONE] + 1*standard deviation ], or between 13,[REDACTED PHONE]
and 17,[REDACTED PHONE] US dollars.
Additional questions using the standard deviation and z-score can be asked.
For example, how many postcodes are likely to have incomes higher than
19,[REDACTED PHONE] US dollars? This is an important type of questions, especially when we
focus on certain subpopulations. We ﬁrst calculate the z-score: ( value-mean)/
(standard deviation) = (19,[REDACTED PHONE] /C015,[REDACTED PHONE])/2,[REDACTED PHONE] = 2 standard deviations away (see
Eq. [REDACTED PHONE] ). This value means that a postcode with an income of 19,[REDACTED PHONE] US dollars
lies two standard deviations above the mean. As mentioned above, only 5% of
objects lie more than two standard deviations from the mean in case of normally
distributed variable. This is 2.5% in each direction. In the preceding example and
to answer the original question, 2.5% of the postcodes (that is one postcode) have
income larger than 19,[REDACTED PHONE] US dollars.
[REDACTED PHONE] Percentiles, Quartiles and Quantiles
Deﬁnition
Apercentile is a value in a ranked data distribution below which a given
percentage of observations falls. Every distribution has[REDACTED PHONE] percentiles.
Thequartiles are the 25th, 50th and 75th percentiles, called “lower quartile ”
(Q1), “median ”and “upper quartile ”(Q3) respectively[REDACTED PHONE].2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 93 ---
Theinterquartile range (IQR) is obtained by subtracting the lower quartile
from the upper quartile as in[REDACTED PHONE]:
IQR =Upper quartile /C0Lower quartile =Q 3 /C0Q1[REDACTED PHONE])
Quantile s are equal-sized, adjacent subgroups that divide a distribution.
Why Use
Percentiles are used to compare a value in relation to how many values, as a
percentage of the total, have a smaller or larger value. The lower quartile (Q1), the
upper quartile (Q3) and the median are commonly used to show how scores are
distributed for every 25 percentiles. The interquartile range provides a measure of
the variability of the 50% of the objects around the median. Quantiles are often
used to divide probability distributions into areas of equal probabilities. In fact,
percentiles are quantiles that divide a distribution to[REDACTED PHONE] subgroups.
Interpretation
If the 20th percentile of a distribution is[REDACTED PHONE], then 20% of the observations have
values less than[REDACTED PHONE]. If a student ’s grade lies in the 80th percentile, the student
achieved better grades than did 80% of his/her classmates. The 50th percentile
is the median. As mentioned, the median is the score that splits ranked scores
in two; thus, 50% of the objects have higher scores, and 50% have lower ones.
Discussion and Practical Guidelines
Percentiles and quartiles are not prone to o utliers, as they are based on ranks of
objects. For example, the maximum score would lie in the last percentiles whether
it is an outlier or not. Quartiles provide an effective way to categorize a large
amount of data into a mere four categories. Finally, GIS software uses quantiles to
color and to symbolize spatial entities when there are many different values.
[REDACTED PHONE] Outliers
Deﬁnition
Outliers are the most extreme scores of a variable.
Why Use
They should be traced for three main reasons:
/C15 Outliers might be wrong measurements
/C15 Outliers tend to distort many statistical results
/C15 Outliers might hide signi ﬁcant information worth being discovered and
further analyzed
Interpretation (How to Trace Outliers)
For a univariate distribution, outliers can distort the mean and the standarddeviation. In bivariate and multivariate analyses, many statistics –such as[REDACTED ADDRESS]atistics
                    

--- Page 94 ---
correlation coef ﬁcient, trend lines and regression analysis –will provide false
results if outliers exist. The most common way of tracing outliers is by graphical
representation through a histogram or a boxplot. In case of histograms, if there is
an isolated bar in the far left or far right part of the histogram, it is a serious
indication of having outliers in the data. If skewness is very large (positive or
negative), it is also an indication of outliers ’presence. Another approach is to
regard outliers as those scores lying more than 2.5 standard deviations from the
mean. In fact, it is not easy to set a speci ﬁc number of standard deviations in order
to identify an outlier. When we calculate how many standard deviations from the
mean a potential outlier is, we have to consider that the outlier itself raises the
standard deviation and also affects the value of the mean. A scatter plot (see
Section[REDACTED PHONE] ) is another effective way to locate an outlier in bivariate analysis.
There is also a set of methods of tracing outliers by analyzing the residuals in
regression analysis (e.g., standardized residuals), but we will not refer further to
these methods in this book. We should be cautious about labeling an object as an
outlier because removing it from the dataset leads to new values for the mean,
standard deviation, and oth er statistics. Outliers should be eliminated only if we
comprehend why they exist and whether it is likely that similar values reappear.
Outliers often reveal valuable information. For example, an outlier value for a room ’s
temperature indicates the potential for a ﬁre, allowing preventive action. An outlier
value for credit card use may reveal a diffe rent location than those commonly used
(e.g., in a different country) and thus po tential fraud (Grekousis & Fotis[REDACTED PHONE]).
Thus, de ﬁning a value as an outlier depends on the broader context of the study,
and the analyst should decide to eliminate or include it in the dataset carefully.
Discussion and Practical Guidelines
We can handle traced outliers based on the following guidelines:
/C15 Scrutinize the original data (if available) to check whether the outliers ’
scores are due to human error (e.g., data entry). If scores are correct,
attempt to explain such high or low value, as it is unlikely to be just a
random phenomenon.
/C15 Transform the variable. Still, data transformation does not guarantee
outliers ’elimination. In addition, it may not be desirable to transform
the entire dataset for only a small number of outliers.
/C15 Delete outlier from the dataset or change its score to be equal to the value of
three standard deviations (de Vaus[REDACTED PHONE] p. 94). The choice depends on the
effect it will have on the results, but deletion is preferred. In either case, the
deleted or changed score should be reported.
/C15 Temporarily remove the outlier from the dataset and calculate the statis-
tics. Then include the outliers again in the dataset for further analysis. For
example, suppose we study the socioeconomic pro ﬁling of postcodes.
Some postcodes might have extremely high incomes per capita relative
to others, but they also carry additional socioeconomic information that
might not include outlier values. If we completely remove the postcodes77 2.2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 95 ---
with outlying incomes, we will lose valuable information (regarding the
other variables). For this reason, it is wiser to temporarily remove the
income outliers only for those statistics that have a distorting effect and
include them again for further analysis later.
[REDACTED PHONE] Boxplot
Deﬁnition
Aboxplot is a graphical representation of the key descriptive statistics of a
distribution.
Why Use
To depict the median, spread (regarding percentiles) and presence of outliers.
Interpretation
The characteristics of a boxplot are as follows (see Figure 2.4 ):
/C15 The box is de ﬁned by using the lower quartile Q1 (25%; left vertical edge of
the box) and the upper quartile Q3 (75%; right vertical edge of the box). The
length of the box equals the interquartile range IQR=Q 3 /C0Q1.
/C15 The median is depicted by using a line inside the box. If the median is not
centered, then skewness exists.
/C15 To trace and depict outliers, we have to calculate the whiskers, which are
the lines starting from the edges of the box and extending to the last
object not considered an outlier.
/C15 Objects lying further away than 1.5 /C2IQRare considered outliers.
/C15 Objects lying more than 3.0 /C2IQR are considered extreme outliers, and
those between (1.5 /C2IQRand 3.0 /C2IQR) are considered mild outliers. One
may change the 1.5 or 3.0 coef ﬁcient to another value according to the
study ’s needs, but most statistical programs use these values by default.
/C15 Whiskers do not necessarily stretch up to 1.5 /C2IQRbut to the last object
lying before this distance from the upper or lower quartiles.
If a distribution is positively skewed, the median tends to lie toward the
lower quartile inside the box of a boxplot (see Figure 2.5 A). A boxplot with the
median line near to the center of the box and with symmetric whiskers
slightly longer than the box length tends to represent a normal distribution
(see Figure 2.5B ). Negatively skewed distributions tend to look like graph
C (see Figure 2.5C ). Outliers might lie in any direction. They can also be traced
as isolated bins (see Figure 2.5A andC).
Discussion and Practical Guidelines
Apart from describing a single distribution, boxplots can be used to compare
distributions of the same variable but for different groups. To compare suchdistributions, we use parallel boxplots (see Figure 2.6 ). In this case, boxplots are[REDACTED ADDRESS]atistics
                    

--- Page 96 ---
Figure 2.4 (A) Basic boxplot characteristics. In this graph, outliers exist only in the right
part. Generally, outliers might exist in both parts concurrently. Whiskers stop at the
largest or smallest observation that is not an outlier. In the left part, the minimum value
of the variable lies less than 1.5 /C2IQRaway from the lower quartile (Q1), so there is no
outlier. Whisker lengths are not necessarily the same in the two parts. (B) Eleven people
ranked in ascending order according to their height. The far-right person is a basketball
player, and he is considerably taller than the rest (outlier).
Figure 2.5 Boxplot general look for different type of distributions[REDACTED PHONE].2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 97 ---
Figure 2.6 Boxplots plotted side by side to compare distributions and median. (A) A simple graphical
inspection shows that the median income in urban areas is larger than the median income in rural
areas. (B) Notched boxplot. (C) Notch overlapping indicates not statistical difference between the
median values.8
0

--- Page 98 ---
plotted side by side in a vertical representation, which is more common than the
horizontal representation. We plot each group on the x-axis and the values of
the common variable on the y-axis. A graphical examination is the ﬁrst step in
comparing these distributions –for example, to see if their medians are different.
InFigure 2.6 , we use parallel boxplots to describe two groups, urban and rural
populations (groups on the x-axis), in relation to annual income (variable on the
y-axis). Statistical tests such as a Mann –Whitney U-test should be used to check if
any of the observed difference between the medians are statistically signi ﬁcant.
A particular type of boxplot, the notched boxplot, is used to provide a more
accurate graphical representation for comparison purposes (Chambers et al. [REDACTED PHONE] ;
seeFigure 2.6 B). For each group, it provides the 95% con ﬁdence interval of the
median (see Section[REDACTED PHONE] for con ﬁdence intervals). If these intervals do not overlap
when we compare the two dis tributions (we inspect the y-axis), there is strong
evidence at the 95% con ﬁdence level that the medians differ (see Figure 2.6 B).
The con ﬁdence interval is calculated using the following formula ( [REDACTED PHONE]):
median /C61:57IQRﬃﬃﬃnp[REDACTED PHONE])
In other words, the height of the notches equals[REDACTED PHONE] times the height of the main
box divided by the square root of the sample s ize. This interval depicts the values in
the same units as those of the variable st udied. If we compare the intervals of two
parallel boxplots and ﬁnd that there are no common values, we may conclude, at a
95% con ﬁdence level, that the true medians of the group do differ. In Figure 2.6C ,
although the medians look different, the notches overlap, and we cannot conclude
that there is a statistically signi ﬁcant difference in their medians. To better assess if
there is indeed a statistically signi ﬁcant difference in the median between two
groups, we should use statistical tests (e.g., a Mann –Whitney U test, as mentioned).
[REDACTED PHONE] Normal QQ Plot
Deﬁnition
Thenormal QQ plot is a graphical technique that plots data against a theoret-
ical normal distribution that forms a straight line.
Why Use
A normal QQ plot is used to identify if the data are normally distributed.
Interpretation
If data points deviate from the theoretical straight line, this is an indication of non-
normality (see Figure 2.7 ). The line represents a normal distribution at a 45/C14slope.
If the distribution of the variable is normal, then points will lie on this reference
line. If data points deviate from the straight line and curves appear (especially in
the beginning or at the end of the line), the normality assumption is violated. Forinstance, the plot in Figure 2.7 reveals non-normally distributed data[REDACTED PHONE].2 Simple ESDA Tools and for Visualizing Spatial Data
                    

--- Page 99 ---
2.[REDACTED ADDRESS]atistics for Analyzing Two or More
Variables (Bivariate Analysis)
Spatial analysis often focuses on two di fferent variables simultaneously. This
type of analysis is called “bivariate, ”and the dataset used is called a “bivari-
ate dataset. ”The study of more than two variables, as well as the dataset
used, is called “multivariate. ”Multivariate methods will be presented in
Chapter 5 .
The most common ESDA techniques and descriptive statistics for analyzing
bivariate data include
/C15 Scatter plot
/C15 Scatter plot matrix
/C15 Covariance and variance –covariance matrix
/C15 Correlation coef ﬁcient
/C15 Pairwise correlation
/C15 General QQ plot[REDACTED PHONE] Scatter Plot
Deﬁnition
Ascatter plot displays the values of two variables as a set of point coordinates
(see Figure 2.8 ).
Figure 2.7 Normal QQ plot.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
Why Use
A scatter plot is used to identify the relations between two variables and trace
potential outliers.
Figure 2.8 (A) Scatter plot for variables Income andHouse size . (B) A linear trend
superimposed (with positive slope) reveals a positive linear association between the two
variables. As income increases, so does house size. A value on the far right of the graph
indicates a potential outlier. (C) In exploratory spatial data analysis, a one-to-one linkage
exists, whereby each dot in the scatter plot stands for single spatial unit depicted in a
single location on a map. Notice also that the outlier in the scatter plot is not a locational
outlier, as the spatial entity does not lie far away from the rest of the entities (for
locational outliers, see Section[REDACTED PHONE] ).[REDACTED PHONE] ESDA Tools for Analyzing Two or More Variables
                    

--- Page[REDACTED PHONE] ---
Interpretation
Inspecting a scatter plot allows one to identify linear or other types of associ-
ations (see Figure 2.8A ). If points tend to form a linear pattern, a linear
relationship between variables is evident. If data points are scattered, the linear
correlation is close to zero, and no association is observed between the two
variables. Data points that lie further away on the xorydirection (or both) are
potential outliers (see Figure 2.8B ).
Discussion and Practical Guidelines
Theﬁrst thing to inspect in any bivariate analysis is a scatter plot, which displays
data as a collection of points ( X, Y). In spatial data analysis, a scatter plot is a
map with as many points as the spatial objects (rows in the dataset table;
Figure 2.8C ). For each data row in the database table, we create a set of
coordinates. For example, for a single object, the value of variable A(Income )
is the Xcoordinate, and the value of variable B(House size ) is the Ycoordinate
(see Figure 2.8C ). The X,Ycoordinates can be switched ( AtoYandBtoX) with
no signi ﬁcant change in the analysis. Each point in the scatter plot stands for a
single spatial object in the map (polygons in Figure 2.8C ). As the scatter plot
belongs to the ESDA toolset, it offers the ability to highlight the spatial unit to
which a point in the plot is linked in the map while brushing it. For instance,
if we brush the outlier point, we directly locate which polygon corresponds to
this value. Likewise, we can select one or more polygons in the map and
identify their values in the scatter plot. We can also test if neighboring poly-
gons in the map cluster in the scatter plot and identify if the object clustering in
space creates attribute clusters as well.
[REDACTED PHONE] Scatter plot matrix
Deﬁnition
Ascatter plot matrix depicts the combinations of all possible pairs of scatter plots
when more than two variables are available (see Figure 2.9 ).
Why Use
The visual inspection of all pair combinations facilitates (a) the locating of variables
with high or no association, (b) the identi ﬁcation of relationship type (i.e., linear
nonlinear) and (c) outlying points.
Interpretation
The closer the data points are to a linear pattern, the higher their linear
correlation is to be. On the other hand, the more scattered a pattern is, the
weaker the linear relationship between the two studied variables. The further
away a data point lies from the main point cloud, the more likely it is to be an
outlier.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
Discussion and Practical Guidelines
By inspecting a scatter plot matrix, one can quickly identify a linear or
other type of association for multiple combinations of variables in a single
graph. By identifying which variables exhibit high associations, we can
proceed to further analysis, such as mapping them using choropleth maps or
examining for potential bivariate spatial autocorrelation (see Chapter 4 ).
A scatter plot matrix is a quick and ef ﬁcient ESDA technique for identifying
the association of all variable combinations, and is usually an ef ﬁcient way to
begin an analysis.
[REDACTED PHONE] Covariance and Variance –Covariance Matrix
Deﬁnition
Covariance is a measure of the extent to which two variables vary together
(i.e., change in the same linear direction). Covariance Cov(X,Y) is calculated as[REDACTED PHONE]) (Rogerson[REDACTED PHONE] p. 87):
Cov X ;YðÞ ¼Pn
i¼1xi/C0/C22x ðÞ yi/C0/C22y ðÞ
n/C01[REDACTED PHONE])
Figure 2.9 All combinations of scatter plot pairs for four census variables: Income,
Expenses, Bachelor degree (education), Operators (occupation).[REDACTED PHONE] ESDA Tools for Analyzing Two or More Variables
                    

--- Page[REDACTED PHONE] ---
where
xiis the score of variable Xof the i-th object
yiis the score of variable Yof the i-th object
/C22xis the mean value of variable X
/C22yis the mean value of variable Y
This formula is for sample covariance. For population covariance, we divide by
ninstead of n/C01.
Why Use
Covariance measures the extent to which two variables of a dataset change in
the same or opposite linear direction.
Interpretation
For positive covariance, if variable Xincreases, then variable Yincreases as well.
If the covariance is negative, then the variables change in the opposite way
(one increases, the other decreases). Zero covariance indicates no correlation
between the variables.
Covariance can also be presented al ong with the variance of each vari-
able in a variance –covariance matrix ( [REDACTED PHONE]). In this matrix, the diagonal
e l e m e n t sc o n t a i nt h ev a r i a n c eo fe a c hv a r i a b l e( c a l c u l a t e db a s e do nt h e
dataset matrix A[[REDACTED PHONE]]), and the off-diagonal elements contain the covar-
i a n c eo fa l lp a i r so fc o m b i n a t i o n so ft h e s ev a r i a b l e s .
Variance Covariance
¼s2
1,1 Cov X 1X2 ðÞ Cov X 1X3 ðÞ /C1 /C1 /C1 Cov X 1Xp/C0/C1
Cov X 2X1 ðÞ s2
2,2 /C1/C1/C1 /C1/C1/C1...
Cov X 3X1 ð Þ /C1/C1/C1.../C1/C1/C1...
...
/C1/C1/C1 /C1/C1/C1......
Cov X pX1/C0/C1
/C1/C1/C1 /C1/C1/C1 /C1/C1/C1 s2
p,p2[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE])
A¼X1X2/C1/C1/C1Xp
a1,1/C1/C1/C1 /C1/C1/C1 a1,p
...
/C1/C1/C1 /C1/C1/C1 /C1/C1/C1
an,1/C1/C1/C1 /C1/C1/C1 an,p2[REDACTED PHONE][REDACTED PHONE])
Where pis the total number of variables ( X) and nis the total number of
observations ( α) of the dataset A[REDACTED PHONE]).[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
Discussion and Practical Guidelines
The variance –covariance matrix is applied in many statistical procedures to pro-
duce estimator parameters in a statistical model, such as the eigenvectors and
eigenvalues used in principal component analysis (see Chapter 5 ). It is also used in
the calculation of correlation coef ﬁcients. Covariance and variance –covariance are
descriptive statistics and are widely used in many spatial statistical approaches.
[REDACTED PHONE] Correlation Coef ﬁcient
Deﬁnition
Correlation coef ﬁcient r(x,y)analyzes how two variables ( X, Y ) are
linearly related. Among the correlation coef ﬁcient metrics available, the most
widely used is the Pearson ’s correlation coef ﬁcient (also called Pearson
product-moment correlation), given by ( [REDACTED PHONE]) (Rogerson[REDACTED PHONE] p. 87):
rx;yðÞ¼Cov X ;YðÞ
sxsy[REDACTED PHONE])
where
Cov(X,Y) is the sample covariance between the variables
sxis the sample standard deviation of variable X
syis the sample standard deviation of variable Y
The population correlation coef ﬁcient is calculated using the population covar-
iance and the population standard deviations.
Why Use
Correlation not only reveals if two variables are positively or negatively linearly
related, but it also de ﬁnes the degree (strength) of this relation on a scale of /C01
to +1 by standardizing the covariance.
Interpretation
A positive correlation indicates that both variables either increase or decrease.
A negative correlation indicates that a variable increases when the other
decreases and vice versa. There are six main classes of correlation (see
Figure[REDACTED PHONE] ). A strong positive correlation (for values larger than 0.8) indicates
a strong linear relationship between the two variables; when variable X
Figure[REDACTED PHONE] Labeling correlation[REDACTED PHONE].3 ESDA Tools for Analyzing Two or More Variables
                    

--- Page[REDACTED PHONE] ---
increases (or decreases), then variable Yalso increases (or decreases) to a
similar extent. A moderate positive correlation for values between 0.5 and 0.8
indicates that correlation exists but is not as intense as in a strong correlation.
Observing a weak positive or weak negative correlation does not allow for
reliable conclusions regarding correlation, especially when the values tend to
zero. However, when the values lie between 0.3 and 0.5 (or between /C00.5 and
/C00.3), and according to the problem studied, we may label correlation as
“substantial. ”A moderate negative correlation ( /C00.8 to /C00.5) means that correl-
ation exists but is not very strong. Finally, a strong negative correlation ( –1t o –
0.8) indicates a strong linear relationship between the two variables (but with
different directions: one decreasing and the other increasing or vice versa).
A Pearson ’s correlation close to zero indicates that there is no linear correl-
ation, but this does not preclude the existence of other types of relation, as in
plots D and E in Figure[REDACTED PHONE] . Only if we use the scatter plot can we assess the
potential for other types of relation.
Discussion and Practical Guidelines
Correlation coef ﬁcient is a statistical test, and its results have to be checked
for statistical signi ﬁcance based on the null hypothesis that there is no
Figure[REDACTED PHONE] Linear correlation examples. (A) Strong positive correlation. A linear
regression ﬁt has been superimposed to the data to highlight the linear relationship.
(B) Strong negative correlation. (C) No correlation (independent variables). (D) No linear
correlation, but a curve pattern appears in the data. We can either use a nonlinear
model or transform the data. (E) No linear correlation, but a pattern is observed in
the data.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
correlation between the two variables. A p-value is calculated expressing the
probability of ﬁnding the observed value (correlation coef ﬁcient) if the null
hypothesis is true (see Section[REDACTED PHONE] for a detailed analysis). A signi ﬁcance
level should be set in advance (e.g., [REDACTED PHONE]). If the p-value is smaller than the
signi ﬁcance level (e.g., [REDACTED PHONE]), then we reject the null hypothesis and con-
clude that the correlation observed is statistically signi ﬁcant at the[REDACTED PHONE]
signi ﬁcance level. When reporting correlation values, the results should
always be accompanied by their p-values and a related statement regarding
their signi ﬁcance.
The slope of a regression line (the superimposed line in the data) is not the
same as the correlation coef ﬁcient value unless the two variables used are in
the same scale (e.g., through standardization; see Figure[REDACTED PHONE] ). The correlation
provides us with a bounded measure [ /C01, 1] of the association between the
two variables. The closer to 1 or /C01 it is, the closer it is to a perfect linear
relationship. The slope in a regression line is not bounded by any limit and
shows the estimated change in the expected value of Yfor a one-unit change
ofX. This cannot be produced from the correlation itself. Although a positive
slope is an indication of association and thus of positive correlation (i.e., the
slope and correlation have the same sign), it cannot provide us with the
measure of this association, as the correlation coef ﬁcient does. Nevertheless,
when the variables are standardized, the slope equals the correlation
coefﬁcient.
Data in A and B depict the relation of income and house size for a set of
households in two different neighborhoods (see Figure[REDACTED PHONE] ). Although the
correlation is the same in both neighborhoods, the slope is different. Identical
correlation means that in both neighborhoods, income is almost perfectly
linearly related to housing size (all dots lie on a line), and the points have
Figure[REDACTED PHONE] Correlation coef ﬁcient of income and house size for two different
neighborhoods[REDACTED PHONE].3 ESDA Tools for Analyzing Two or More Variables
                    

--- Page[REDACTED PHONE] ---
similar deviations from the line. A difference in the slopes would indicate that,
in neighborhood A, the increase in house size for one additional unit of
income is far larger than that in neighborhood B. Why this happens should
be determined through additional data analysis. It might be because neigh-
borhood A lies in suburbs, where more space is available, and that neighbor-
hood B lies close to the city center, where houses in the same price range are
smaller.
Finally, correlation is a measure of association and not of causation. Cor-
relation is often used to prove that one action is the result of another. This
assumption is wrong if made with no further analysis. Correlation establishes
only that something is related to something else. Causation and relationship/
association are different. High correlation reveals a strong relation but not
necessarily causation. Imagine a study on daily sales of ice cream along with
the daily sales of cold drinks during the summer. If we calculate their correl-
ation, we will probably identify a strong correlation since sales of ice cream
and cold drinks are at their peak in t he summer. Although this suggests an
association or mathematical relationship (strong correlation), no functional
relationship of causation is observed. It is not that high sales of ice cream
drive (cause) the high sales of cold drinks (effect), nor is it the other
way around. Another factor drives the relationship: temperature. High tem-
peratures during summer drive (cause) the consumption (effect) of these
products. Thus, the sales of ice cream and cold drinks have a functional
relationship with temperature but do n ot have a relationship between them;
this type of correlation is called “spurious. ”However, if we study the link
between personal income and educational attainment, we will probably ﬁnd
a strong correlation, as people with higher income tend to have obtained at
least a bachelor ’s degree. This is a sign of causation, as it is widely accepted
that people with higher educational attainment are likely to get well paid
jobs. This is merely an indication of causation and is not a de ﬁnite cause-
and-effect relationship. Determining whether such a link between these two
variables exists would require additional scienti ﬁc analysis and meticulously
designed statistical experiments through explanatory analysis.
[REDACTED PHONE] Pairwise Correlation
Deﬁnition
Pairwise correlation is the calculation of the correlation coef ﬁcients for all
pairs of variables.
Why Use
When dealing with a large dataset, we can simultaneously calculate the correl-
ations between all pairs of variables to identify potential linear relationships
quickly.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
Interpretation
Fornvariables, the result is a square n-by-nmatrix with the coef ﬁcient correl-
ation values stored in the off-diagonal cells (see Table 2.3 ). Diagonal cells have
a value of 1 to indicate the correlation of a variable with itself. The correlation is
then interpreted as explained in Section[REDACTED PHONE] .
Discussion and Practical Guidelines
We can also create a pairwise matrix plot, which displays on the off-diagonal
cells (a) the scatter plots of variable pairs, (b) the correlation coef ﬁcients for
each set of variables,and (c) the trend line. In the diagonal cells, the histogram
of each variable is presented (see Figure[REDACTED PHONE] ). This matrix is more informative
than the scatter plot matrix, which includes only the scatter plots (see Section[REDACTED PHONE] ).
[REDACTED PHONE] General QQ plot
Deﬁnition
Ageneral QQ plot depicts the quantiles of a variable against the quantiles of
another variable.
Why Use
This plot can be used to assess similarities in the distributions of two variables (see
Figure[REDACTED PHONE] ). The variables are ordered, and cumulative distributions are calculated.
Interpretation
If the two variables have identical distributions, then the points lie on the
reference line at 45/C14; if they do not, then their distributions differ.
2.4 Rescaling Data
Deﬁnition
Rescaling is the mathematical process of changing the values of a variable to a
new range.Table 2.3 Pairwise correlation matrix for ﬁve variables. Example solved can be found in Chapter 6 .
Var1 Var2 Var3 Var4 Var5
Var1[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]
Var2[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]
Var3[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]
Var4[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]
Var5[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] 2.4 Rescaling Data
                    

--- Page[REDACTED PHONE] ---
Figure[REDACTED PHONE] Correlation pairwise matrix plot.
Figure[REDACTED PHONE] General QQ plot of Income (x-axis) and Expenses (y-axis). Points deviate from
a reference line at 45/C14, and their distributions cannot be regarded identical.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
Why Use
When variables have very different scales, it is very hard to compare them
directly. By rescaling data, the spread and the values of the data change, but
the shape of the distribution and relative attributes of the curve remain
unchanged. Large differences in value ranges and scales, infer mainly two
problems (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]):
/C15 First, comparing their descriptive statistics –such as mean, median and
standard deviation –is hard, and interpretation is not straightforward.
/C15 Second, we cannot combine data with widely differing upper and lower
scores and create an index or ratio as one variable might have a larger
impact on the index formula, due solely to its scale. The resulting values
are likely to be too large or too small, which will also be hard to interpret.
Rescaling data is also widely used in multivariate data analysis for data reduc-
tion and data clustering (see Chapter 5 ). For example, as clustering methods
are based in the calculation of a dissimilarity matrix which contains the statis-
tical distance (e.g., Euclidean distance) among data points, failing to rescale
data leads to assigning disproportionally more importance in variables with
signi ﬁcantly larger values with respect to the other ones.
Methods
To avoid such problems, three rescaling methods can be used:
/C15 Normalize: The following formula is a typical method of creating
common boundaries ( [REDACTED PHONE]):
Xrescale ¼X/C0Xmin
Xmax /C0Xmin/C18/C19
n[REDACTED PHONE])
where Xrescale is the rescaled value, Xis the original value, Xmin is the
minimum value, Xmax is the maximum value and nis the upper limit de ﬁned
by user of the ﬁnal variable. This rescaling method is prone to outliers, as they
will scale data to a very small interval (the Xmax –Xmin range will be large).
Forn= 1, the rescaled variable ranges from 0 to 1. This is called “normaliza-
tion ”and scales all numeric variables in the range [0, 1]. We can also normalize
data to the [ /C01, 1] range using ( [REDACTED PHONE]):
Xrescale ¼/C0 1þ2X/C0Xmin
Xmax /C0Xmin/C18/C19[REDACTED PHONE])
/C15 Adjust: Another method of rescaling data is to divide a variable (or
multiply it by assigning weights) by a speci ﬁc value. For example, instead
of using dollars to describe a person ’s income, we could use “income ”[REDACTED PHONE] Rescaling Data
                    

--- Page[REDACTED PHONE] ---
divided by “average income of the country. ”This new value has several
advantages. Due to in ﬂation, income is not directly comparable across
years. This ratio provides average income as an analogy of personal
income, which is comparable across years. This is called “adjustment ”:
we adjusted personal income to average income. Adjustment is the
process of removing an effect (e.g., in ﬂation, seasonality) to obtain more
comparable data. Adjustments are also needed to compare incomes
in different places, such as countries. We cannot directly compare
the income of someone in a Western country to that of someone in a
developing country because the income variable has wide differences
within its range. Adjustments could be expressed in many other ways
depending on the problem studied and the research question/hypoth-
esis tested.
/C15 Standardize: Calculate z-scores. A z-score is the number of standard
deviations a score lies from the mean (see Section[REDACTED PHONE] ,Eq. [REDACTED PHONE] ). Put
simply, a standardized variable expresses the original variable values
in standard deviation units (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]). A standardized
v a r i a b l ea l w a y sh a sam e a no f0a n dav a r i a n c eo f1( d eS m i t h[REDACTED PHONE]
p. [REDACTED PHONE]).
Discussion and Practical Guidelines (Normalization vs. Standardization)
Normalization and standardization are widely used in statistics and subse-
quently in spatial analysis. But which method is more appropriate? There is a
not a straightforward answer, but let us see some basic differences:
/C15 As a standardized variable always has a zero mean and a unit variance, it
provides little information when we want to compare the means between
two distributions. When the mean values are important, normalization is
more descriptive.
/C15 With standardization, the new values are not bounded. On the contrary,
with normalization, we bound our data between 0 and 1 (or /C01 to 1).
Having comparable upper and lower bounds might be preferable and
more meaningful for some studies (e.g., marketing analysis) especially
when the mean values are important.
/C15 In the presence of outliers, normalizing the non-outlying values will scale
them to a very small interval. This does not happen with standardization.
/C15 Standardization is most useful for comparing subgroups of the same
variable, such as comparing between rural and urban incomes (Grekousis
et al. 2015a).
/C15 Standardization is also used in multivariate data analysis (see Section 5.1 )
so that variables do not depend on the measurement scale and are
comparable with each other (Wang[REDACTED PHONE] p. [REDACTED PHONE]). It is occasionally pre-
ferred to normalization as it better retains the importance of each vari-
able due to the non-bounding limitation. For example, in case of outliers,[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
normalized data are squeezed at a small range, and as such, when
dissimilarities (through statistical distances) are calculated, they contrib-
ute less to the ﬁnal values.
/C15 Many algorithms (especially arti ﬁcial neural networks ’learning algo-
rithms) assume that data are centered at 0. In this case, standardization
seems a more rational choice than normalization.
Keep in mind that rescaling is not always desirable. In case we have data of
similar scales or proportions (e.g., percentages) or we want to assign weights to
the variables with larger values, we might not consider normalizing, adjusting
or standardizing. It depends on the problem in question and the available
dataset to decide if and which rescaling type to apply.
2.[REDACTED ADDRESS]atistics
In the previous sections, we discussed the use of descriptive statistics and related
ESDA tools in summarizing the key characteristics of the distribution of an attri-
bute. However, to delve deeper into the statistical analysis of a problem, we
should apply more advanced methods. For example, though we can summarize a
sample, we cannot make any inference related to the statistical population it
refers to. Descriptive analysis is accurate only for the speci ﬁcs a m p l ew ea r e
analyzing, and the results and conclusions cannot be expanded to the statistical
population it was drawn from. Suppose we query 30 households in a neighbor-
hood about income, the size of the house and the number of cars owned by each
family. Using descriptive statistics, we calculate the average family income, the
number of cars owned per family and the frequency distribution of their house
size. This is a very good start, but it does not tell us much about the wider area.
For example, what is the average income of the census tract this neighborhood
belongs to? What is the average house size in the city? What is the relation
between income and the number of cars i n the city? These questions attempt
to generate results at a larger scale but cannot be directly answered using the
descriptive statistics calculated fro m the 30-household sample. Making inferences
from a sample to a population re quires inferential statistics.
Deﬁnition
Inferential statistics is the branch of statistics that analyzes samples to draw
conclusions for the entire population. In other words, through inferential stat-
istics, we infer from the sample data what are the characteristics of the
population.
Why Use
Inferential statistics are used when we need to describe the entire population
through samples. With inferential statistics, we analyze a sample, and the ﬁndings95 2.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
are generalized to a larger population. Descriptive statistics, by contrast, hold true
only for the speci ﬁc sample they were calculated for; this should be acknowledged
in any analysis. Scholars often generalize and use descriptive statistics to summarize
populations. A sample can describe a population if speci ﬁc procedures (through
inferential statistics) have been followed, but this should be clearly stated.
Importance to Spatial Statistics
Spatial statistics use inferential statistics to make inferences about a statistical
population. For example, spatial autocorrelation tests, spatial regression
models and spatial econometrics use inferential statistics methods. Being able
to evaluate the results of spatial statistics and draw correct conclusions requires
understanding of inferential statistics. Incorrect interpretations of advanced
spatial statistics are common and usually stem from ignorance of inferential
statistics theory. A ﬁrm knowledge of inferential statistics is required for anyone
undertaking spatial analysis through spatial statistics, and the next sections
cover the rudiments of the following inferential statistics topics:
/C15 What are parametric and nonparametric methods and tests?
/C15 What is a test of signi ﬁcance?
/C15 What is the null hypothesis?
/C15 What is a p-value?
/C15 What is a z-score?
/C15 What is the con ﬁdence interval?
/C15 What is the standard error of the mean?
/C15 What is so important about normal distribution?
/C15 How can we identify if a distribution is normal?
[REDACTED PHONE] Parametric Methods
Deﬁnitions
Parametric methods and tests are statistical methods using parameter esti-
mates for statistical inferences (see Table 2.4 ; Alpaydin[REDACTED PHONE] p. 61). They
assume that the sample is drawn from some known distribution (not necessarily
normal) that obeys some speci ﬁc rules. They belong to inferential statistics.
Population parameters are values calculated from all objects in the popula-
tion and describe the characteristics of the population as a whole. Population
parameters are ﬁxed values. Each population parameter has a corresponding
sample statistic.
Sample statistics are characteristics of a sample. They can be used to
provide estimates of the population parameters. Sample statistics do not have
ﬁxed values and are associated with a probability distribution called a sampling
distribution. In practice, any value calculated from a given sample is called a
statistic (Alpaydin[REDACTED PHONE] , p. [REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
Parameter estimates are estimates of the values of the population param-
eters. Parameter estimates are estimated from the sample using sample
statistics (e.g., sample mean, sample variance). As soon as parameters are
estimated, they are plugged into the assumed distribution, and the ﬁnal
size and shape of the distribution for the speci ﬁc dataset are determined
(Alpaydin[REDACTED PHONE] p. 61). The most commonly used methods of estimating
population parameters are the maximum likelihood estimation and the
Bayesian estimation.
How Parametric Methods Work
To better understand how inferential statistics and parametric methods work,
we should take a look at the following basic ﬂowchart (see Figure[REDACTED PHONE] ):
The parametric statistical approach is composed of four basic steps:
A) Population: Deﬁne the population that the study refers to.
B) Sampling: Using a sampling procedure, extract data from the entire
population. We use samples because we cannot practically measure each
single object of a population.
C) Sample: Make assumptions about the distribution of the entire popula-
tion –for example, that it follows the normal distribution. Next, estimate
the population parameters using sample statistics.Table 2.4 Parametric and nonparametric statistics according to the scope of analysis and measurement
level. Parametric and nonparametric statistics can be found in many textbooks, studies and papers.
A scientist should have a rough idea about these tests in order to comprehend why they are selected
among so many different test options and what they intend to identify. The table focuses on statistics most
commonly used to determine if differences or correlations among variables exist.
IdentifyParametric
statistics (normal)Nonparametric
statistic
(non- normal)Level of
measurement (for
nonparametric)
Difference between two
independent groupst test (interval) Mann –Whitney
U-test
Kolmogorov –
Smirnov Z test
Chi squareOrdinal/Interval
Ordinal/Interval
Nominal/Ordinal/
Interval
Difference between
more than two
independent groupsAnalysis of
variance and
F test (interval)Kruskall Wallis
analysis of ranks
Median test
Chi squareOrdinal/Interval
Interval
Nominal/Ordinal/
Interval
Difference between two
related groupst test for
dependent
samples (interval)Sign test
Wilcoxon ’s test
McNemarOrdinal/Interval
Ordinal/Interval
Nominal/Ordinal/
Interval
Correlation between
variablesPearson ’sr
(interval)Spearman ’s Rho
Kendall ’s tau
GammaOrdinal
Ordinal
Ordinal97 2.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
D) Inference: Plug these parameters into functions describing the assumed
distribution (e.g., probability density function), to obtain an estimation of
the entire population.
Let us consider an example. Suppose we want to analyze the sales of a
product according to the customers ’age structure. The four basic steps are:
A) Population: Customers
B) Sampling: Weﬁrst randomly select our sample (random sampling is not
the only sampling method) from the total population. The sample might
benpeople who ﬁlled in questionnaires.
C) Sample: We then assume that the variable “sales of product ”is normally
distributed over the age of a potential customer (the age intervals on the
x-axis and relative frequency of sales on the y-axis follow a normally
shaped bell). We estimate two sample statistics: sample mean (as an
estimate of the population mean) and the sample standard deviation (as
an estimate of the population standard deviation) using the respective
formulas.
D) Inference: Applying these two estimates in the normal probability dens-
ity function ( Eq. 2.1 ) allows for further analysis. For example, what is the
probability that this product will be selected by customers aged 20 to 25?
If the probability obtained from the probability distribution function is
Figure[REDACTED PHONE] Parametric statistical approach.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
not desirable (e.g., lower than the company ’s target), we might decide to
invest more in advertisements targeting this age group. We can also
calculate con ﬁdence intervals, which is the range of values within which
the above estimates are likely to lie at a 95% certainty. By conducting
simple parametric statistical analysis, we achieved decision making.
Discussion and Practical Guidelines
Inferential statistics use sample statistics, so termed because they refer to a
sample and not to the entire population. In general, the characteristics of
samples are denoted by Latin letters and the characteristics of populations by
Greek letters (see Table 2.2 ). Formulas for the same measure (e.g., standard
deviation) might change when populations or samples are calculated. The term
“statistic ”is used only for samples . The term “parameter ”refers only to
a population . (Tip to remember: “population ”starts with “P,”as does “par-
ameter ”;“statistics ”starts with “S,”as does “sample. ”)Parameters are
descriptive measures of the entire population and are ﬁxed values. Parameter
estimates are calculated based on sample statistics and are not predictable.
They are associated with probability distributions and margins of errors.
To sum up, the main goal of inferential statistics is to estimate the original
population parameters and measure the amount of error of these estimates
by depending on the sample data available. In other words, as we cannot
directly measure these parameters in the entire population, we use the sample
to estimate the true parameters.
For example, the normal distribution needs only two parameters to be
deﬁned, the mean and the standard deviation. Once we estimate these param-
eters (by using the sample data), we can plug them into a probability distribu-
tion function (see Section[REDACTED PHONE] ) to generate the distribution curve. The
objective of describing the population is now achieved. In inferential statistics,
each distribution is de ﬁned entirely by only a small number of parameters
(usually one to three).
The parameters ’values are called “estimates, ”as we do not calculate a value
but produce an estimation with an associated error. The correct expression is
“parameter estimate, ”not “parameter calculation. ”There are two basic
approaches to evaluating a parameter estimation:
(a) Using a con ﬁdence interval (see Section[REDACTED PHONE] )
(b) Hypothesis testing (see Section[REDACTED PHONE] )
Parametric methods and tests are more accurate and have higher statistical
power if the assumption of the distribution adopted (e.g., normal) is true
relative to nonparametric methods (see next section). Another advantage is
that the problem is reduced to the estimation of a small number of parameters
(e.g., mean and variance). Inferences are valid only if the assumptions made inthe parametric approach hold true. When the assumptions (e.g., randomly99 2.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
selected and independent samples) fail, they have a greater chance of produ-
cing inaccurate results. In the spatial context, we have to assess if the assump-
tions hold before we use a parametric test. Randomness and thus complete
spatial randomness are rare in space due to spatial autocorrelation (as a result
of spatial dependence in most of the geographical problems –seeChapter 4 ).
Spatial statistics that overcome these problems should be created. However,
the new tests share similar terminology with classic statistics and are also based
on signi ﬁcance tests, hypothesis testing and con ﬁdence intervals.
[REDACTED PHONE] Nonparametric Methods
Deﬁnition
Statistical methods used when normal distribution or other types of
probability distributions are not assumed are called “nonparametric ”(Alpay-
din2009 p. [REDACTED PHONE]) .In nonparametric methods, we do not make assumptions
about the distribution of the data and the parameters of the population we
study. The distribution and the number of parameters are no longer ﬁxed.
Why Use
If assumptions are violated or if we cannot be certain whether they hold true,
we may turn to nonparametric statistics/methods/models. Nonparametric
models are not based on assumptions regarding either the sample (data) or
the population drawn from (e.g., linear relationship or normal distribution).
Nonparametric methods are based on the data, and their complexity depends
on the size of the training dataset. The only assumption made is that similar
inputs lead to similar outputs (Alpaydin[REDACTED PHONE] p. [REDACTED PHONE]). In nonparametric
methods, the parameters ’set is not ﬁxed and can increase or decrease
according to the available data.
Discussion and Practical Guidelines
Nonparametric methods do not imply the absence of parameters. They imply
the non-prede ﬁned nature of the analysis and suggest that parameters are
ﬂexible and adapt according to the data (for example, the nonparametric kernel
density estimation method [Section[REDACTED PHONE]] has the smoothing parameter h).
Nonparametric tests are widely used when populations can be ordered in
a ranked form. Ordinal data are thus well structured for nonparametric tests.
One can use an index to rank interval data and nonparametric tests if needed.
Nonparametric statistics make fewer assumptions and are simpler in structure;
for this reason, their applicability is wide. Nonparametric methods include
Mann –Whitney U test (also known as Wilcoxon test), Kolmogorov –Smirnov
test, Kruskall –Wallis one-way analysis of ranks, Kendall ’s tau, Spearman ’s rank
correlation coef ﬁcient, kernel density estimation and nonparametric regression
(de Vaus[REDACTED PHONE] p. 77; Table 2.4 ).[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
[REDACTED PHONE] Con ﬁdence Interval
Deﬁnition
Conﬁdence interval is an interval estimate of a population parameter. In other
words, a con ﬁdence interval is a range of values that is likely to contain the true
population parameter value. A con ﬁdence interval is calculated once a con ﬁ-
dence level is de ﬁned.
Conﬁdence level for a con ﬁdence interval re ﬂects the probability that the
conﬁdence interval contains the true parameter value. It is usually set to 95% or
99%. It should not be confused with the signi ﬁcance level (see Section[REDACTED PHONE] ).
A con ﬁdence level of 95% re ﬂects a signi ﬁcance level of 5%.
Why Use
Conﬁdence interval is used to estimate a range of values in which a population
parameter lies for a certain con ﬁdence level (probability).
Interpretation
How accurately a statistic estimates the true population parameter is always an
issue. The con ﬁdence interval of a statistic (e.g., the mean) estimates the
interval within which the population parameter (e.g., mean) ranges. The con ﬁ-
dence interval is expressed in the same unit used for the variable.
Conﬁdence intervals are constructed based on a con ﬁdence level X%
deﬁned by the user, such as 95% or 99%. Con ﬁdence levels indicate that, if
we conducted the sampling procedure several times, the con ﬁdence interval
would include the estimated population parameter X out of[REDACTED PHONE] times. Have in
mind that the con ﬁdence level, (for example, 95%), does not indicate that for a
given interval there is a 95% probability that the parameter lies within this
interval. It indicates that 95% of the experiments will include the true mean, but
5% will not. Based on the de ﬁnition of con ﬁdence interval by Neyman (Neyman[REDACTED PHONE] ), once an interval is de ﬁned, the parameter either is included or not. As
such, the probability does not refer to whether the population lies inside the
interval but on the reliability of the estimation process of getting an interval
that includes the true population parameter.
Discussion and Practical Guidelines
We will not detail how con ﬁdence intervals are calculated, as computing them
directly is rare. It is more important to understand their use. Suppose we have
selected a sample of households from a city and we want to estimate the mean
household income of the households for the entire city (population). If the
sample ’s mean income is 15,[REDACTED PHONE] US dollars and the margin of error for the 95%
conﬁdence level is /C6500 US dollars, this typically means that we can be 95%
conﬁdent that the mean income of the households in this city ranges between
14,[REDACTED PHONE] and 15,[REDACTED PHONE] (con ﬁdence interval). There is still a 5% chance that the
mean income lies in another range. To reduce the range of the interval, we can101 2.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
use a larger sample size. This will reduce the standard error and thus the
interval produced.
Conﬁdence intervals are different from signi ﬁcant tests (explained in Section[REDACTED PHONE] ). A con ﬁdence interval provides a more complete view of a variable.
Instead of deciding whether or not to reject the sample estimate, a con ﬁdence
interval estimates the margin of error of the sample estimate (de Vaus[REDACTED PHONE]
p. [REDACTED PHONE]). The margin of error is the value to be added or subtracted from the
statistic –e.g., sample mean –which re ﬂects the interval length. This reminds us
that there is no absolute precision in any estimate.
Conﬁdence intervals (and standard error of the mean discussed in the
next section) are common in reports and papers related to geographical
analysis, and one should be able to interpret these statistics in the context
provided.
[REDACTED PHONE] Standard Error, Standard Error of the Mean, Standard Error of Proportion and
Sampling Distribution
Deﬁnitions
The standard error of a statistic is the standard deviation of its sampling
distribution (Linneman[REDACTED PHONE] p. [REDACTED PHONE]). The standard error reveals how far the
sample statistic deviates from the actual population statistic.
Standard error of the mean is the standard deviation of the sampling
distribution of the mean.
Asampling distribution is the distribution of a sample statistic for every
possible sample of a given size drawn from a population.
The standard error of the mean refers to the change in mean in each different
sample. This procedure is more straightforward than it seems. The standard
error of the mean is calculated by the following formula ( [REDACTED PHONE]): (O ’Sullivan &
Unwin[REDACTED PHONE] p. [REDACTED PHONE]):
σ/C22x¼sﬃﬃﬃnp[REDACTED PHONE])
sis the sample standard deviation of the distribution studied.
nis the sample (number of objects).
In case that attribute values (scores) are expressed as percentages ( P), the
standard error is calculated by the following formula ( [REDACTED PHONE]) (Linneman[REDACTED PHONE]
p. [REDACTED PHONE]):
σ/C22x¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
P1/C0PðÞ
nr[REDACTED PHONE])
For example, if for n= 30 (sample size of students), 6 out of 30 (20%) smoke,
then the standard error of this value would be ( [REDACTED PHONE]) (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]):[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
σ/C22x¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
0, [REDACTED PHONE] /C00;20 ðÞ
30r
¼7:3% ([REDACTED PHONE])
Why Use
The standard error of a statistic shows how accurately this statistic estimates the
true population parameter. The standard error of the mean, for example, is used to
estimate how precisely the mean of a sample has been calculated. It measures how
close the sample mean is to the real population mean. The standard error is also
used to calculate the con ﬁdence interval based on the z-score and the con ﬁdence
level (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]).
Interpretation
Low values of the standard error of the mean indicate more precise estimates
of the population mean. The larger the sample is, the smaller the standard error
calculated. This is rational, as the more objects we have, the closer to the real
values our approximation will be. According to two rules from probability
theory:
/C15 There is 68% probability that the population parameter is included
in a con ﬁdence interval of /C61 standard error from the sample
estimate.
/C15 There is 95% probability that the population parameter is included in a
conﬁdence interval of /C61.96 standard errors from the sample estimate
(de Vaus[REDACTED PHONE] p. [REDACTED PHONE]).
Discussion and Practical Guidelines
When we estimate a population parameter (e.g., the mean), we obtain a
single value of the statistic, also called the “point estimate. ”As this point
estimate has been calculated based on a sample, it is subject to a sampling
error. It is crucial to identify how much error this point estimate is likely to
contain. In other words, we cannot be con ﬁdent that one sample represents
the population accurately. We have to select more than one sample, calculate
the statistic for each one and then analyze the formed distribution of the
statistic produced. As such, estimating population parameters usually
requires the use of a sampling distribution and the calculation of the standard
error, which re ﬂects how well the statistic estimates the true population
parameter.
For example, to calculate the standard error of the mean, we should take
many samples and then calculate the mean for each one. The resulting distri-
bution is a sampling distribution of the mean. According to central limit
theorem for a sample size larger than 30 objects/observations (O ’Sullivan &
Unwin[REDACTED PHONE] p. [REDACTED PHONE]):[REDACTED PHONE] Inferential Statistics and Spatial Statistics
                    

--- Page[REDACTED PHONE] ---
/C15 The distribution of the sample means will be approximately normal
(as the sample size increases), regardless of the population
distribution shape.
/C15 The mean of the sampling distribution of the mean is approximating the
true population mean.
As it is unfeasible to draw hundreds of samples to create a sampling distribu-
tion, we use a single sample drawn from a population that hypothetically
belongs to a sampling distribution and then estimate the value of the standard
error of this sampling distribution (for example, for the standard error of the
mean, we use Equation [[REDACTED PHONE] ] [Linneman[REDACTED PHONE] p. [REDACTED PHONE]]).
[REDACTED PHONE] Signi ﬁcance Tests, Hypothesis, p-Value and z-Score
Deﬁnition
Atest of signi ﬁcance is the process of rejecting or not rejecting a hypothesis
based on sample data. A test of signi ﬁcance indicates the probability that the
results of the test are either due to sampling error or re ﬂect a real pattern in
t h ep o p u l a t i o nt h es a m p l ew a sd r a w nf r o m( d eV a u s[REDACTED PHONE] p. [REDACTED PHONE]). A test of
signi ﬁcance is used to determine the probability that a given hypothesis
is true.
Thep-value is the probability of ﬁnding the observed (or more extreme)
results of a sample statistic (test statistic) if we assume that the null hypothesis is
true. It is a measure of how unlikely the observed value or pattern is to be the
result of the process described by the null hypothesis. It is calculated based on
the z-score.
Thez-score (also called z-value) expresses distance as the number of stand-
ard deviations between an observation (for hypothesis testing calculated by a
speci ﬁc formula for a statistical test) and the mean. It is calculated (for samples)
by the following formula ( [REDACTED PHONE]):
z score ¼x/C0/C22x
s[REDACTED PHONE])
where
xiis the score of the ith object
/C22xis the sample mean value
sis the sample standard deviation
The z-score is widely used in standardization (see Section 2.4 ), in determining
conﬁdence intervals and in statistical signi ﬁcance assessments (O ’Sullivan &
Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Signi ﬁcance level αis a cutoff value used to reject or not reject the null
hypothesis. Signi ﬁcance level αis a probability and is user-de ﬁned, usually
taking values such as α= [REDACTED PHONE], [REDACTED PHONE] or[REDACTED PHONE], which stand for 5%, 1% and[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
0.1% probability levels. The smaller the p-value the more statistically sig-
niﬁcant the results. A signi ﬁcance level of 5% re ﬂects a con ﬁdence level
of 95%.
Interpretation
In general, signi ﬁcance tests use samples to decide between two opposite
statements (the null hypothesis and the alternative hypothesis). For example, a
null hypothesis ( H0) can be that the sample observations result purely from a
random process. The alternative hypothesis ( H1) states the opposite: that the
sample observations are in ﬂuenced by a nonrandom cause. This type of
hypothesis is the most common in statistical testing. Another null hypothesis
could be that there are no differences between two samples drawn from
different distributions. The alternative hypothesis states that there are differ-
ences between the samples. The statement of the null hypothesis can be set
according to the problem, but the alternative is the opposite. Statistical tests
reject or do not reject the null hypothesis. In this respect, they should be
designed carefully to re ﬂect the problem at hand.
/C15 Rejecting the Null Hypothesis (p /C20α)
A low p-value means that the probability that the observed results are the
outcome of the null hypothesis under consideration is low. If the p-value
is smaller than α, then we reject the null hypothesis.
Rejecting the null hypothesis means that there is a[REDACTED PHONE]% /C0α)prob-
ability that the alternative hypothesis H 1is correct.
More analytically, we accept H1a st r u eb u t ,i nr e l a t i o nt oap r o b a b i l -
ity, not a certainty. This means that there is always a chance that H1
will be accepted as true when it is not (Type I error; de Vaus[REDACTED PHONE]
p. [REDACTED PHONE]). For example, we might reject the null hypothesis with a
probability of 95%, but there still is a 5% (signi ﬁcance level) chance
that it is true.
/C15 Not Rejecting the Null Hypothesis (p >α)
When the p-value is larger than signi ﬁcance value α, then we cannot
reject the null hypothesis. In such a case, we have to be very careful in
interpreting the results. We do not use the word “accept ”for the null
hypothesis. When we do not reject the null hypothesis, this does not
mean that we accept it; it simply means that we fail to reject it, as there is
insuf ﬁcient evidence to do so.
Not rejecting the null hypothesis means that we do not have
enough evidence to reject it but we cannot accept it without
further analysis.
For example, in case we test whether a distribution is normal or not (Ho:
the distribution is normal) and p>α, we cannot straightforwardly accept
the distribution is normal. We have to examine other characteristics, such105 2.[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
as plots or descriptive statistics, to conclude if we will ultimately accept the
null hypothesis. In correlation analysis using Pearson ’s r statistic, failing to
reject the null hypothesis (Ho: there is no linear correlation between two
variables) does not necessarily mean that there is no correlation between
the variables. If we create a scatter plot, we might trace a nonlinear
correlation. An ef ﬁcient way to avoid non-rejecting null hypothesis vague-
ness is to switch hypotheses. When deciding which alternative hypothesis
to use, we have to consider our research objective. Some texts may say
that not rejecting the null hypothesis means that you can accept it, but this
should be done with caution.
Two types of error can result from a signi ﬁcance test (de Vaus[REDACTED PHONE] p. [REDACTED PHONE]):
/C15 T y p eIe r r o r :w h e nw er e j e c tt h en u l lh y p o thesis when it is true. This is deter-
mined by the signi ﬁcance level, which re ﬂects the probability of Type I error.
/C15 Type II error: when we do not reject the null hypothesis when we should.
In the spatial context, these errors re ﬂect the multiple comparison problem and
spatial dependence (see Section 4.6 on how to deal with these problems).
Discussion and Practical Guidelines
A typical work ﬂow for signi ﬁcance testing is as follows (provided that a sample
is collected):
1. Make the statement for the null hypothesis (e.g., that the sample is drawn
from a population that follows a normal distribution)
2. Make the opposite statement for the H1hypothesis (e.g., that the sample
is drawn from a population that is not normally distributed)
3. Specify the signi ﬁcance level α.
4. Select a test statistic (some formula) to calculate the observed value (e.g.,
sample mean).
5. Compute the p-value, which is the probability of ﬁnding the observed (or
more extreme) results of our sample if we assume that the null hypothesis
is true. Put otherwise, the p-value is the probability of obtaining a result
equal to (or more extreme than) the one observed in our sample if the
null hypothesis is true.
6. Compare the p-value to the signi ﬁcance value α.I fp/C20α, then we can reject
the null hypothesis and state that the alternative is true and that the observed
pattern, value, effect or state is statistically signi ﬁcant. If p>α, then we cannot
reject the null hypothesis, but we cannot accept it either.
Signi ﬁcance tests are used to assess if the probability that the null hypothesis
is true (e.g., correlation or differences among samples) is due to sampling
error or the existence of patterns in the population. One of the problems of
using tests of signi ﬁcance is that they produce a binary yes/no (rejected/not
rejected) answer, which does not entirely apply to the real world. In addition,[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
signi ﬁcance tests do not provide a sense of the magnitude of any effect that
the hypothesis is testing. On the contrary, the con ﬁdence interval approach,
presented in the previous section, is more suf ﬁcient on this aspect (see
Box 2.2 ).
Box 2.2 Think of signi ﬁcance and hypothesis tests as a trial. If you are
accused of tax fraud, the tax bureau charge you. The null hypothesis is
“guilty, ”and the alternative hypothesis is “not guilty. ”Your layer will bring
forward evidence to prove that you are not guilty. If the evidence is suf ﬁ-
cient, then the null hypothesis will be rejected and you will be declared
innocent. If the evidence is not suf ﬁcient, then you cannot prove that you are
not guilty, and you will most likely pay a ﬁne. Still, this does not necessarily
mean that you are guilty. Maybe the lawyer ’s evidence was not strong
enough. In a parallel statistical world, the null hypothesis (guilty) is rejected
if there is strong evidence (sample data with small errors). If the evidence is
not suf ﬁcient, you cannot reject the null hypothesis, but this does not mean
that the null hypothesis is true either.
Example in a Geographical Context
Suppose we study the following geographical research question: do people in
cities earn more money than people in rural areas? This is a typical comparison
question and is very interesting from the social and geographical perspectives.
The key element here is location. Our research question is whether the rural –
urban distinction affects income. It is a binary question, so hypothesis testing
can offer valuable insights. Suppose we collected our samples and found that
people living in cities have a 50% higher mean income than people living in
rural areas. The real question is then as follows:
Is the difference in mean income between rural and urban areas real, or it
is just a random result due to sampling error (caused by not asking the
right people about their income)?
In other words, can we state in a statistically sound way that this re ﬂects the
entire population? Remember that, even if we have the perception that people
usually tend to earn more money in cities, we have to prove it or give a
statistical context that supports our belief. Otherwise, it is just an opinion.
We could use the following hypotheses (statements) for our research
question:
H0= People in cities do not have a mean income different from that of
people in rural areas (null hypothesis).
H1= People in cities have a mean income different (50% higher) from that of
people in rural areas (alternative hypothesis).[REDACTED PHONE] Inferential Statistics and Spatial Statistics
                    

--- Page[REDACTED PHONE] ---
The two samples here are “people living in cities ”(e.g., 1,[REDACTED PHONE] people asked
about their income during the last year) and “people living in rural areas ”(e.g.,
1,[REDACTED PHONE] people asked about their income during the last year). The population
characteristic to be analyzed is “income. ”
To answer this question (or, more correctly, to attempt to come to a conclu-
sion), we have to run a signi ﬁcance test. If the entire population could be
measured, we would not need such tests. We could decide which statement
is correct based on the entire population. As this is usually impossible, signi ﬁ-
cance tests are necessary.
/C15 Rejecting the null hypothesis
If the null hypothesis is rejected, we reject the hypothesis that the two
samples and their distributions are the same ( not different ). In other
words, there is a “good chance ”that there is a difference between the
distributions, and this difference is not just a result of randomness. This
means that there are some patterns (reasons) by which these distributions
are different. The “good chance ”is estimated with a probability value
using the signi ﬁcance level.
If we select α=0 . 0 5a st h es i g n i ﬁcance level and the resulting p-value
calculated by the test is p= [REDACTED PHONE], then we can reject the null hypothesis
because p<α. This means that H1is true and that there are differences
between urban and rural incomes. The research hypothesis that income in
cities is 50% higher than that in villages (as calculated earlier) is now statistically
backed up. This result is typically expressed in statistical language as follows:
People living in urban areas have an income statistically different (50%
higher) than that of people living in rural areas at the 5% signi ﬁcance
level.
The chance of ﬁnding the observed differences (e.g., in mean income)
if the null hypothesis is true (no differences) is only p= [REDACTED PHONE], or[REDACTED PHONE]%.
This can be stated using the signi ﬁcance level as follows:
There is a less than 5% chance that this difference is the result of
sampling error.
In other words, we have a 95% ([REDACTED PHONE]% /C0α) probability (con ﬁdence) that
our distributions are different.
Again, the results of all statistical signi ﬁcance tests are based on prob-
ability distribution functions. They produce probabilities, not certainties.
In the preceding example, the conclusion is not that the “distributions are
different ”but that the “distributions are likely to be different with a
probability of 95%. ”There is always a 5% chance that the distributions
are similar. As a result, the smaller the signi ﬁcance level, the higher the
chance that the results are close to the real values. The 95% probability in
our example means that, if we randomly selected[REDACTED PHONE] different samples[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
from a population, people living in cities would have incomes 50% higher
than people living in rural area in 95 of the cases. More generally, 95 of the
cases would reject the null hypothesis. Still, there is always the chance that
ﬁve samples would not display the difference hypothesized.
/C15 Not rejecting the null hypothesis
Suppose that, for the same α= [REDACTED PHONE] signi ﬁcance level, the resulting p-
value calculated by the test were p= [REDACTED PHONE]. In this case, we cannot reject
the null hypothesis because p>α. If we cannot reject the null hypothesis,
we have to state the following:
We have insuf ﬁcient evidence to reject the null hypothesis that people
in cities do not have a mean income different from that of people in
rural areas.
Not rejecting the null hypothesis does not mean that we accept it, nor
that the observed difference among the distributions is wrong. As we
cannot reach a solid conclusion, we have to carry out other experiments
and use other methods to decide whether incomes differ between rural
and urban areas.
2.6 Normal Distribution Use in Geographical Analysis
Importance to Spatial Analysis
Spatial analysis is commonly used to study either the distribution of the
locations of events/polygons or the spatial arrangement of their attributes
(i.e., socioeconomic variables). As geographical analysis is an interdisciplinary
ﬁeld, many reports and research papers use purely statistical methods to
supplement the core spatial analysis. For example, suppose we want to analyze
the spatial distribution of income in relation to educational attainment. It is
reasonable to begin with classic statistical analysis like the calculation of the
Pearson ’s correlation coef ﬁcient between income and educational attainment.
Spatial statistics can then be applied such as bivariate spatial autocorrelation to
determine if these two variables tend to spatially cluster together. In geograph-
ical analysis, spatial statistics go along with classical statistics.
Statistics often deal with a normal distribution because many well-de ﬁned
statistical tests (e.g., Pearson ’s correlation coef ﬁcient, analysis of variance,
t-test, regression, factor analysis) are based on the assumption that the exam-
ined distribution is normal. If the observed distribution does not resemble a
normal distribution (e.g., is skewed), then many statistical procedures are not
accurate and cannot be used or should be used with caution. For example, if a
distribution is skewed, the probability of having small values (compared to
large values) differs. If data are collected through random sampling, differentproportions of values are highly likely to fall into speci ﬁc intervals. This will lead
to an overrepresentation or underrepresentation of speci ﬁc values in the
sample that should be taken into account. We should highlight here that most109 2.6 Normal Distribution Use in Geographical Analysis
                    

--- Page[REDACTED PHONE] ---
statistics are based on a normal underlying distribution for attribute values and
on a Poisson probability distribution for point patterns (Anselin[REDACTED PHONE] p. 4).
Poisson probability distribution is used to assess the degree of randomness in
point patterns (Oyana & Margai[REDACTED PHONE] p. 75, Illian et al. [REDACTED PHONE] p. 57).
How to Identify a Normal Distribution
There are three simple methods of determining if a distribution is normal or
not:
1. Create a histogram and superimpose a normal curve. Plot inspection can
enable a rough estimation of whether the distribution approximates the
normal curve.
2. Calculate the skewness and kurtosis for the distribution. If the distribution
is skewed and/or kurtosis is high/low, we have a clear indication that the
distribution is not normal (see Section[REDACTED PHONE] ).
3. Create a normal QQ plot (see Section[REDACTED PHONE] ).
What to Do When Distribution Is Not Normal
If the distribution is not normal, we have three options (we assume that outliers
have been removed):
Option 1. Use nonparametric statistics (see Table 2.4 ).
Option 2. Apply variable transformation. An ef ﬁcient way to avoid a non-
normal distribution is to transform it (if possible) to a normal distribution.
Table 2.5 presents transformations that can be used to transform a
variable according to its skewness value. This transformation will not
necessarily lead to a normal distribution, but there is a good chance that
it will.
Option 3. Check the sample size. According to the central limit theorem, if
the sample is larger than 30 –40, parametric statistics may be used without
affecting the results ’credibility. This theorem states that, given certain
conditions, as the size of a random sample increases, its distribution
approaches a normal distribution. In other words, even if our distribu-
tion is not normal, we can use parametric statistics if we have a large
sample (de Vaus[REDACTED PHONE] p. 78). Such a violation of the normality assump-
tion does not cause major problems (Pallant[REDACTED PHONE] ). It is not easy to
deﬁne the ideal value by which a sample can be regarded as large.
According to the literature, val ues of 30 to 40 are regarded as suf ﬁcient
for a sample to be considered large an d follow the central limit theory.
In spatial analysis, this means that we need a sample of more than
30 spatial entities (e.g., postcodes, cities, countries) to use parametric
statistics. When fewer spatial entities are involved, it is essential to
check the variables for normality if we want to make inferences for a
larger population[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
2.7 Chapter Concluding Remarks
/C15 Exploratory spatial data analysis and related tools offer a comprehensive
visual representation of statistics by linking graphs, scatter plots or histo-
grams with maps.
/C15 Data are just numbers stored in tables in a database management
system. By analyzing data, we add value, creating information and then
knowledge.
/C15 Spatial statistics employ statistical met hods to analyze spatial data, quantify
a spatial process, discover hidden pa tterns or unexpected trends and model
these data in a geographic context.
/C15 Choropleth maps are typically the ﬁrst thing created after a geodatabase
is built.
/C15 Designing and rendering choropleth maps is an art form. As professionals,
we have to create accurate maps and graphs, always use solid statistics to
back up our ﬁndings and avoid misleading messages.
/C15 Inspecting the basic characteristics of a variable is essential prior to any
sophisticated statistical or spatial analysis. Calculating the mean value,
maximum value, minimum value and standard deviation of a variable
provides an initial description of its distribution.
/C15 Creating a frequency distribution histogram and inspecting for potentialnormality are also necessary.Table 2.5 Transformations to reduce skewness and restore normality.
Skew Transformation Formula Used When
High positive
skewReciprocals or
Negative reciprocal.
Reciprocal of a ratio may often
be interpreted as easily as the
ratio. For example:
population density (people per
unit area) becomes area per
persons.Yn= 1/Y
Yn=/C01/Y
A d d1i n
values less
than 1 up
to 0Reciprocals are useful when all
values are positive.
The negative formula is used to
transform negative values.
Reciprocal transformation
reverses order among positive
values: largest becomes
smallest.
Moderate to
low positive
skewSquare root
Logarithmic transformationYn= Y^0.5
Yn= logy or
Yn=lnyMay have many zero ’s or very
small values.
May have a physical exponent
(e.g., area).
May have only positive values.
Low to
moderate
negative skewPower square Y n= Y^2 May have a logarithmic trend
(decay, survival, etc.).
High negative
skewPower cube Y n= Y^3 May have a logarithmic trend
(decay, survival, etc.).[REDACTED PHONE] Chapter Concluding Remarks
                    

--- Page[REDACTED PHONE] ---
/C15 In general, calculating the common measures of center, shape and
spread gives quick insights into the dataset and should be conducted
prior to any other analysis.
/C15 Boxplots are very helpful descriptive plots, as they depict measures of center,
shape and spread and allow for comparison of two or more distributions.
/C15 Scatter plot matrices and pairwise correlation matrices give a snapshot of
the relationships among pairs of variables in a dataset. It is advisable to
build such plots right from the beginning to gain quick insights into
the data.
/C15 Locating outliers is necessary, as statistical results might be distorted if
outliers are not removed or properly handled.
/C15 Rescaling variables with large differences in their scale and range,
through normalization, adjustment or standardization is necessary when
want to compare them or to use them in the same formula.
/C15 While observations independence should exist in classical statistics,
spatial dependence usually exists in spatial statistics, and classical statis-
tics should be modi ﬁed accordingly.
/C15 A test of signi ﬁcance is the process of rejecting or not rejecting a hypoth-
esis based on sample data. It is used to determine the probability that a
given hypothesis is true.
/C15 A p-value is the probability of ﬁnding the observed (or more extreme)
results of a sample statistic (test statistic) if we assume that the null
hypothesis is true.
/C15 Rejecting the null hypothesis means that there is a probability (calculated
as the difference: [REDACTED PHONE]% /C0α) that alternative hypothesis H1is correct ( αis
the signi ﬁcance level).
/C15 Not rejecting the null hypothesis means that there is not suf ﬁcient evi-
dence to reject the null hypothesis, but we cannot accept it either
without further analysis.
/C15 Statistics often deal with normal distribution because many well-de ﬁned
statistical tests are based on the assumption that the examined distribu-
tion is normal.
Questions and Answers
The answers given here are brief. For more thorough answers, refer back to the
relevant sections of this chapter.
Q1.Why are spatial statistics used?
A1. Spatial statistics employ statistical methods to analyze spatial data, quan-
tify a spatial process and discover hidden patterns or unexpected trends
in these data in a geographic context. Spatial statistics are built upon
statistical concepts, but they incorporate location parameters such as[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
coordinates, distance and area. They extend classic statistical measures
and procedures and offer advanced insights for data analysis. In geo-
graphical analysis, spatial statistics are not used separately from statistics
but in complementary ways.
Q2.What is the main difference between spatial statistics and descriptive
statistics?
A2. There is a fundamental difference between classical and spatial statistics. In
classical statistics, we make a basic assumption regarding the sample: it is a
collection of independent observations that follow a speci ﬁc, usually normal,
distribution. Contrariwise, in spatial st atistics, because of the inherent spatial
dependence and the fact that spatial autocorrelation exists (usually), the
focus is on adopting techniques for detecting and describing these
correlations. In other words, in classical statistics, observations independence
should exist while, in spatial statistics, spatial dependence usually exists.
Classical statistics should be modi ﬁed accordingly to adapt to this condition
Q3.What is a choropleth map, and why is it used?
A3. Choropleth maps are thematic maps in which areas are rendered according
to the values of the variable displayed. Through choropleth maps, we
visually locate if values cluster together or whether they exhibit similar
spatial patterns. Rendering choropleth maps is usually the ﬁrst task when
spatial data are joined to nonspatial data (i.e., attributes from a census).
Q4.Which are the two main types of choropleth maps? Give examples.
A4. There are two main categories of variables displayed in choropleth maps:
(a) spatially extensive variables and (b) spatially intensive variables. In
spatially extensive variables, each polygon is rendered based on a meas-
ured value that holds for the entire polygon –for example, total popula-
tion, total households or total number of children. In the spatially
intensive category, the values of the variable are adjusted for the area
or some other variable. For example, population density, income per
capita and rate of unemployment are spatially intensive variables
because they take the form of a density, ratio or proportion.
Q5.What are the measures of spread? Name a few.
A5. Measures of spread (also called measures of variability, variation, diversity
or dispersion) of a dataset are measures that provide information of how
much the values of a variable differ among themselves and in relation to
the mean. The most common measures are range, deviation from the
mean, variance and standard deviation.
Q6.What is an outlier? Why should we trace outliers?
A6. Outliers are the most extreme scores of a variable. They should be traced
for three main reasons: (a) outliers might be wrong measurements, (b)
outliers tend to distort many statistical results, and (c) outliers might hide
signi ﬁcant information worth to be discovered and further analyzed.
Q7.How should we handle outliers?[REDACTED PHONE] Questions and Answers
                    

--- Page[REDACTED PHONE] ---
A7. We can handle traced outliers based on the following guidelines:
/C15 Scrutinize the original data (if available) to check whether the
outliers ’scores are due to human error (e.g., data entry). If scores
are correct, attempt to explain such high or low values.
/C15 Transform the variable. However, data transformation does not
guarantee outliers ’elimination. In addition, it may not be desirable
to transform the entire dataset for only a small number of outliers.
/C15 Delete the outlier from the dataset or change its score to be equal
to the value of three standard deviations.
/C15 Temporarily remove the outlier from the dataset and calculate the
statistics. Then include the outliers again in the dataset for further
analysis.
Q8.What is the Pearson ’s correlation coef ﬁcient? Does it reveal association
or causation?
A8. A correlation coef ﬁcient r(x,y)analyzes how two variables ( X, Y )a r e
linearly related. Among the correlation coef ﬁcient metrics available,
the most widely used is the Pearson ’s correlation coef ﬁcient (also called
Pearson product-moment correlation ). Correlation is a measure of asso-
ciation and not of causation. Causation and relationship/association are
different. High correlation reveals a strong relation but not necessarily
causation.
Q9.How can a Pearson ’s correlation coef ﬁcient be interpreted (how many
classes of correlation exist)?
A9. There are six main classes of correlation. A strong positive correlation (for
values larger than 0.8) indicates a strong linear relationship between the
two variables; when variable Xincreases (or decreases), then variable Y
also increases (or decreases) to a similar extent. A moderate positive
correlation for values between 0.5 and 0.8 indicates that correlation exists
but is not as intense as in a strong corre lation. Observing a weak positive or
weak negative correlation does not allow for reliable conclusions regarding
correlation, especially when the values tend to zero. However, when the
values lie between 0.3 and 0.5 (or between –0.5 and –0.3), and according to
the problem studied, we may label correlation as “substantial. ”Am o d e r a t e
negative correlation between –0.8 and –0.5 means that correlation exists
but is not very strong. Finally, a strong negative correlation between –1
and –0.8 indicates a strong linear relationship between the two variables
(but with different directions: one decreasing and the other increasing or
vice versa).
Q10. What is a typical work ﬂow of a signi ﬁcance test?
A10. A typical work ﬂow for signi ﬁcance testing is as follows (provided that a
sample is collected):
1. Make the statement for the null hypothesis (e.g., that the sample is drawn
from a population that follows a normal distribution)[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
2. Make the opposite statement for the H1hypothesis (e.g., that the sample
is drawn from a population that is not normally distributed)
3. Specify the signi ﬁcance level α.
4. Select a test statistic (some formula) to calculate the observed value (e.g.,
sample mean).
5. Compute the p-value, which is the probability of ﬁnding the observed (or
more extreme) results of our sample if we assume that the null hypothesis
is true. Put otherwise, the p-value is the probability of obtaining a result
equal to (or more extreme than) the one observed in our sample if the
null hypothesis is true.
6. Compare the p-value to the signi ﬁcance value α.I fp/C20α, then we can
reject the null hypothesis and state that H1is true and that the observed
pattern, value, effect or state is statistically signi ﬁcant. If p>α, then we
cannot reject the null hypothesis, but we cannot accept it either[REDACTED PHONE] Questions and Answers
                    

--- Page[REDACTED PHONE] ---
                    

--- Page[REDACTED PHONE] ---
LAB 2
EXPLORATORY SPATIAL DATA ANALYSIS (ESDA): ANALYZING AND
MAPPING DATA
Overall Progress
Scope of the Analysis: Income and Expenses
This Lab deals with
/C15 Objective 1 : Locating high income areas (see Table 1.2).
/C15 Objective 4: Identify socioeconomic drivers of high monthly expenses
(including those for coffee-related services)
Figure[REDACTED PHONE] Lab 2 work ﬂow and overall progress[REDACTED PHONE] Scope of the Analysis: Income and Expenses
                    

--- Page[REDACTED PHONE] ---
As explained in Lab 1, the investor is interested in identifying, among others:
1. The areas where the residents have high incomes (i.e., the areas in which
clients will have stronger purchasing power)
2. Whether there is any relation between speci ﬁc socioeconomic variables
(e.g., educational attainment) and monthly expenses (including coffee
products and services). This will allow to better delineating the demo-
graphical pro ﬁle of the target group.
To address the preceding questions, we use simple ESDA tools and descriptive
statistics (see Figure[REDACTED PHONE] ). A more thorough analysis for Objective 1 is carried
out in Lab 4, and a deeper analysis for Objective 4 is presented in Labs 6 and 7.
Section A ArcGIS
Exercise 2.1 ESDA Tools: Mapping and Analyzing the Distribution of Income
In this exercise, we (a) map Income (yearly average income of the residents living
in each postcode), (b) plot related graphs and (c) apply descriptive statistics to
analyze income ’s spatial and statistical distribution.
ArcGIS Tools to be used: Choropleth map, Histogram, Normal QQ plot,
Boxplots, Z-score rendering
ACTION: Create choropleth map of income
Navigate to the location you have stored the book dataset andclick the Lab2_SimpleESDA.mxd
Main Menu >File >Save As >My_Lab2_SimpleESSA.mxd
In I:\BookLabs\Lab2\Output
TOC >RC City >Properties >TAB = Symbology >Quantities >
Graduated colors >(see Figure[REDACTED PHONE])
Value = Income
Color Ramp = Yellow to Brown
Classes = 4 >Click Classify >Break Values >[REDACTED PHONE] >Enter >[REDACTED PHONE] >Enter
>[REDACTED PHONE] >Enter[REDACTED PHONE] >OK (See also Figures[REDACTED PHONE] and1.13 )
RC Label >Format Labels >Numeric >Rounding >Number of
decimal places = 2 >OK>Apply >OK
TOC >RC City >Save As Layer File >
Name = Income.lyr
In I:\BookLabs\Lab1\Output[REDACTED ADDRESS]atistics
                    

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Tip:Saving City into a layer allows us to save the income representation. Keep
in mind that when you add the layer in the table of contents, it receives the
name of the original shape ﬁle created (i.e., City, in this example) and not the
name it was saved as (i.e., CityIncome.lyr).
Interpreting results: Four postcodes groups of average annual income are
created: (a) a group with income of less than 15,[REDACTED PHONE] per year (low-income
areas), (b) a group with income between 15,[REDACTED PHONE] and 20,[REDACTED PHONE] (average income
areas), (c) a group with income between 20,[REDACTED PHONE] and 25,[REDACTED PHONE] (average to high-
income areas) and (d) a group with income over 25,[REDACTED PHONE] (high-income areas;
see Figure[REDACTED PHONE] ). The map shows that high-income areas are centrally
located (dark red). Most of the postcodes with average-to-high or high
incomes lie inside the downtown area (red polygon). Lower-income areas
can be found in the northern, western and southern postcodes. This depic-
tion of the Income variable provides a ﬁrst indication of how income is
spatially distributed (spatial arrangement of values) across the postcodes
of the city (see Box 2.3 ). To analyze how the values of Income are distributed
in relation to the mean value (not spatially in this case), we can use the
frequency distribution histogram.
Figure[REDACTED PHONE] Layer properties dialog box for rendering Income[REDACTED PHONE] Exercise 2.1 ESDA Tools

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Box 2.3 Analysis Criterion C1 to be used in synthesis Lab 5.4: The loca-
tion of a coffee shop should lie within areas with: Income >20,[REDACTED PHONE] euros .
[C1_HighIncome.shp ]
Main Menu >Selection >Select By Attributes
Layer = City
SELECT * FROM City Where: “Income ”>=[REDACTED PHONE]
OK
TOC >RC City >Data >Export Data >Output feature class: I:
\BookLabs\Lab2\Output\C1_HighIncome.shp
Main Menu >Selection >Clear Selected Features
Add the shapefile to the data view just to inspect the output, but
you can then remove it to proceed with the exercise.
Figure[REDACTED PHONE] Income map classi ﬁed into four groups[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
ACTION: Create histogram
Main Menu >Customize >Extensions >Check Geostatistical
Analyst >Close (if checked already do not uncheck)
Main Menu >Customize >Toolbars >Check Geostatistical Analyst
Toolbar = Geostatistical Analyst >Geostatistical Analyst >
(see Figure[REDACTED PHONE] ) Explore Data >Histogram
Select Layer = City (see Figure[REDACTED PHONE] )
Attribute = Income
Figure[REDACTED PHONE] Geostatistical analyst toolbar.
Figure[REDACTED PHONE] Histogram of income[REDACTED PHONE] Exercise 2.1 ESDA Tools
                    

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Move the graph at the left corner of the map, as shown in Figure[REDACTED PHONE] .
TOC >RC City >Open Attribute Table
Select a bin in the graph and examine the highlighted post codes
in the map.
On the histogram, click Add to Layout >Move the graph at the
lower left corner of the layout area >Get back to the Data View
>Close the histogram plot
Main Menu >Selection >Clear Selected Features >Close
Attribute Table of City >
Main Menu >File >Save
Figure[REDACTED PHONE] Brushing capabilities of ESDA tools. By selecting a bin in the graph, the
corresponding polygons in the map and the lines in the attribute table of the City
layer are highlighted[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Interpreting results: The histogram depicts basic descriptive statistics of
income (e.g., mean value, standard deviation, skewness, kurtosis, ﬁrst and
third quartiles; see Figure[REDACTED PHONE] ). Results show that the distribution of income
is positively skewed and deviates signi ﬁcantly from a normal distribution. To
better test if the distribution is normal, we should build a normal QQ plot
(see Figure[REDACTED PHONE] ).
The histogram is linked to the shape ﬁle’s polygons and the layer ’s attri-
bute table, which is one of the bene ﬁts of using ESDA tools in spatial
analysis. By brushing a bin in the histogram, the related polygons are
highlighted in the map and the relevant lines of the attribute table are
selected (see Figure[REDACTED PHONE] ). Likewise, if we select a polygon or a row in the
attribute table, the relevant bin in the histogram will be highlighted.
A graphical inspection shows a bin in the far right of the histogram that
should be further analyzed, as it might reveal the existence of outliers
(postcodes with extremely large income values).
Postcodes with extreme values of income (spatial outliers) can be
traced using standard deviation by de ﬁning an outlier to be an observation
that lies at least 2.5 standard deviations from the mean (see Section[REDACTED PHONE] ).
The standard deviation and the mean of Income are 4,[REDACTED PHONE] and 16,[REDACTED PHONE],
respectively (see Figure[REDACTED PHONE] ). Thus, an income value is an outlier if it is
larger than
Outlier >Mean + 2.[REDACTED ADDRESS]andard Deviation = [REDACTED PHONE] + 2.5 x[REDACTED PHONE]
= [REDACTED PHONE]
or if it is smaller than
Outlier <Mean - 2.[REDACTED ADDRESS]andard Deviation = [REDACTED PHONE]
By sorting Income in the attribute table of the layer City, we check whether
postcodes with incomes above or below these values exist. In fact, two
postcodes are labeled as outliers, those having income values larger than
28,[REDACTED PHONE].
An alternative way to test for outliers and also obtain a different repre-
sentation of the Income distribution is by using boxplots and z-score
rendering (see Figures[REDACTED PHONE] and[REDACTED PHONE]).
ACTION: Create Normal QQ plot to identify if income follows anormal distribution
Toolbar = Geostatistical Analyst >Geostatistical Analyst >
Explore Data >Normal QQPlot123 Exercise 2.1 ESDA Tools
                    

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Select Layer = City (see Figure[REDACTED PHONE] )
Attribute = Income
Select the upper right dots on the plot and examine the
highlighted polygons (see Figure[REDACTED PHONE]).
On the Normal QQ plot, click Add to Layout >Move the graph next
to the Histogram Plot >Get back to the Data View >Close the
histogram plot
Main Menu >Selection >Clear Selected Features
Interpreting results: The normal QQ plot reveals that income values
deviate from the straight line of a normal distribution (see Figure[REDACTED PHONE] ).
We can thus argue that the distribution of the variable Income is not
normally distributed. By brushing the points at the far right of the plot,
we locate the postcodes that deviate signi ﬁcantly from the line (expected
values of income if the distribution were normal; see Figure[REDACTED PHONE] ). These
postcodes belong to the high-income group and are also clustered inside
the downtown area, an interesting ﬁnding from the spatial analysis
perspective.
Figure[REDACTED PHONE] Normal QQ plot for income[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics
                    

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
ACTION: Create box plot
Main Menu >View >Graphs >Create Graph >
Graph type = Box Plot (see Figure[REDACTED PHONE] )
Layer/Table = City
Value field = Income >Next >
Title = Income >Finish
Brush the extreme outlier in the graph (marked with *) and
locate the postcode in the map (see Figure[REDACTED PHONE] ).
Figure[REDACTED PHONE] Locating postcodes with high income (far right of the QQ plot) that deviate
signi ﬁcantly from the normal distribution line[REDACTED PHONE] Exercise 2.1 ESDA Tools
                    

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont.)
RC on the Boxplot >Add to Layout >Move the boxplot to an empty
space >Get back to Data View >Close graph >
Main Menu >Selection >Clear Selected Features
Interpreting results: The boxplot depicts three mild outliers (symbolized with
a dot) and one extreme outlier (symbolized with an asterisk) (see Figure[REDACTED PHONE] ;
seeSection[REDACTED PHONE] for how mild and extreme outliers are de ﬁned based on the
interquartile distance –not on the standard deviation). Mild and extreme
outliers are located in the upper side of the box, refering to high-income
values. Outliers of low-income values are not traced. By brushing the extreme
outlier dot in the boxplot, we locate on the map the respective postcode in
the downtown area (see Figure[REDACTED PHONE] ). To decide how to handle the extreme
outlier identi ﬁed earlier, we should go through the practical guidelines speci-
ﬁed in Section[REDACTED PHONE] . First, we check if this observation is an incorrect database
entry. In this dataset, the value is correct. Second, we should assess if such a
value is reasonable based on (a) our knowledge of the speci ﬁc area (if any) and
(b) common sense. It is quite common to trace areas with remarkably higher
income than others, as income inequality exists in most regions worldwide.Our dataset does not seem to deviate from this trend.
Figure[REDACTED PHONE] Boxplot dialog box[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Additionally, an Athenian or a visitor would also know that the area that
income clusters (including the outlier) is near the central city square (Consti-
tutional Square) where the National Parliament is located. The square is
surrounded by many beautiful neoclassical buildings, a city park, the Acrop-
olis archeological site and a residential area, which historically attracted
higher-income classes. It seems quite reasonable for these high and
extremely high values of income to lie there. For this reason and in the
context of the speci ﬁc project (coffee shop market analysis), we should not
remove the observation or transform the dataset, as it provides valuable
information. In a different context though, we might have to remove this
observation. For example, If we are mostly interested in the social analysis of
the average- to low-income regions, then removing the extreme outlier
would make the results more realistic. For example, the mean income would
be smaller, better re ﬂecting the real economic condition of the majority of
the citizens. The standard deviation and other descriptive statistics (e.g.,
conﬁdence intervals, quartiles) would be different as well. In other words,
Figure[REDACTED PHONE] Brushing the extreme outlier in the boxplot to locate the postcode in the
map. All graphs are saved on the layout view of the .mxd[REDACTED PHONE] Exercise 2.1 ESDA Tools

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
removing the outlier would portray the socioeconomic pro ﬁle of the residents
better. From the social analysis perspective, the existence of outliers in our
case study reveals large income inequality, which is a signi ﬁcant ﬁnding.
ACTION: Calculate and render income z-scores. Identify income outliers.
Another way to map income is by calculating and rendering its z-score. By
such rendering, we map the deviations of income for each postcode from
the mean value. As such, we trace areas with similar or different values from
the average income, which also allows for locating spatial outliers.
TOC >RC City >Open Attribute Table
Click at Table Options button >Add Field >
Name = IncZScore
Type = Float
Precision = 5 (stores the total number of digits on both sides
of the decimal place
Scale = 3 (stores the number of digits to the right of the
decimal place)
OK
RC IncZScore column >Field Calculator >(Click YES if a pop up message
appears about calculation outside an edit session) >In the IncZScore
field type ([Income]- [REDACTED PHONE])/[REDACTED PHONE] >OK>Close table (see Figure[REDACTED PHONE] )
See exercise 1.1: Mean = [REDACTED PHONE], Standard deviation = [REDACTED PHONE].6ArcToolbox >Spatial Statistics Tools >Rendering >ZScore
Rendering (see Figure[REDACTED PHONE] )
Input Feature Class = City
Field to Render = IncZScore
Output Layer File = I:\BookLabs\Lab2\Output\IncZScore.lyr
OK
Next identify as outliers those postcodes with IncZScore largerthan 2.5.
Main Menu >Selection >Select by Attributes >
Layer = IncZScore (see Figure[REDACTED PHONE] )
Method = Create a new selection
DC“IncZScore ”[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Figure[REDACTED PHONE] Calculating z-score of Income through ﬁled.
Figure[REDACTED PHONE] Calculating z-score of Income through ﬁled[REDACTED PHONE] Exercise 2.1 ESDA Tools

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Go to the “SELECT*FROM City WHERE: ”window and after “IncZScore ”
type: >=2.5
OK
Main Menu >Selection >Clear Selected Features
Main Menu >File >Save
Interpreting results: The map in Figure[REDACTED PHONE] depicts the z-scores of income
for each postcode. The higher/lower the z-score, the larger the difference
between the income value of the postcode with the mean annual income for
the entire study region. Red areas have annual incomes larger than two
standard deviations from the mean income and cluster in the downtown
area. Values larger or less than 2.5 standard deviations reveal potential
spatial outliers. Based on this de ﬁnition, two postcodes with high income
can be labeled as spatial outliers (those highlighted with a light blue outline).
No outliers for extreme low-incomes are identi ﬁed. It is obvious that differ-
ent outliers ’deﬁnitions lead to slightly differently results (see Figure[REDACTED PHONE] ).
How many and which outliers will ﬁnally be retained depends on the
analysis.
Figure[REDACTED PHONE] Select by attributes dialog box[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
ArcGIS Tip: If you want to run again this exercise from scratch you should
(a) Delete City.shp located in I:\BookLabs\Lab2\Output
(b) Copy City.shp located in I:\BookLabs\Data and paste it in E:\Book-
Labs\Lab2\Output ( so that the ﬁeldIncZScore is not contained)
Exercise 2.2 Bivariate Analysis: Analyzing Expenditures by Educational Attainment
In this exercise, we study if educational attainment is related to monthly
expenditures (including coffee-related expenses). The following variables
are analyzed: Expenses, University and SecondaryE .Expenses is the
Figure[REDACTED PHONE] Mapping and rendering z-scores of income. Two postcodes have z-scores
larger than[REDACTED PHONE] Exercise 2.2 Bivariate Analysis

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
average monthly expenses per person in euros for everyday costs (e.g.,
grocery shops, coffee shops). University is the percentage of people living
in a postcode who have a bachelor ’s degree, while SecondaryE is the
percentage of people who have completed secondary education. We create
a scatter plot and a scatter plot matrix to graphically determine if any linear
or nonlinear relations exist among the aforementioned three variables. This
type of analysis provides us with initial information about the relative rela-
tionships among the variables. It does not allow us to quantify the real effect
of one variable on another. To do so, we should apply more advanced
methods, such as regression, spatial regression or spatial econometrics
(discussed in Chapters 6 and 7). We are still describing and exploring our
dataset and not explaining deeper relations, causes and effects.
ArcGIS Tools to be used: Scatter plot ,Scatter plot matrix
ACTION: Create scatter plot
Navigate to the location you have stored the book dataset and
click My_Lab2_SimpleESDA.mxd
Main Menu >View >Graphs >Create Graph >
Graph type = Scatter plot
Figure[REDACTED PHONE] Scatter plot of University over Expenses[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
Layer/Table = City
Y field = Expenses
X field (optional)= University
Uncheck Add to legend >Next >Finish
RC on graph >Add to layout
Get back to the Data View >Brush the dots in the upper-right corner and
observe the selected polygons in the map (see Figure[REDACTED PHONE] ).
Main Menu >Selection >Clear Selected Features >Close Graph >
Save
Interpreting results: The dots of the scatter plot are colored according to
the income of each postcode (see Figure[REDACTED PHONE] ). By brushing the points in the
Figure[REDACTED PHONE] Brushing between scatter plot and map[REDACTED PHONE] Exercise 2.2 Bivariate Analysis

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
upper-right (high education –high expenditures) of the scatter plot, the
related postcodes are highlighted in the map. Selected features reveal two
interesting ﬁndings: (a) the postcodes are spatially clustered, and (b) most of
the postcodes belong to the high-income group (dark red polygons). This
reveals colocation among higher education, expenditures and higher
income.
The scatter plot reveals that there is a nearly linear relation between
University and Expenses (see Figure[REDACTED PHONE] ).The higher the percentage
of people with university degree in a postcode, the more money they spent
on average for monthly expenses. We observe that the postcodes lying on
the right part of the graph (high education –high expenditures) are clustered
in the city center, where high-income postcodes (dark brown) are also
clustered (see Figure[REDACTED PHONE] ). As a result, we have a ﬁrst sign of colocation
among high-income, high educational attainment and high expenditures.
ACTION: Create scatter plot matrix
Main Menu >View >Graphs >Create Scatter plot Matrix Graph (see
Figure[REDACTED PHONE] )>
Layer/Table = City (see Figure[REDACTED PHONE] )
1 Field name = Expenses
2 Field name = University
3 Field name = SecondaryE
Check Show Histograms >Next >Finish
RC on graph >Add to layout >Get back to the Data View >Close
Scatter plot Matrix
Main Menu >File >Save
Figure[REDACTED PHONE] Create scatter plot matrix graph[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
Interpreting results: The scatter plot matrix depicts all combinations among
Expenses, University and SecondaryE (see Figure[REDACTED PHONE] ). The upper-right
graph is an enlargement of the selected scatter plot on the matrix (in this case,
the lower left). The histograms of each variable are also presented. We observe,
that variable Expenses is positively skewed, meaning that people in most of the
postcodes spend less than the average. Still, in some postcodes, people spend
well above the average. Looking at the scatter plots, there is no apparent relation
between secondary education and monthly expenditures (random cloud of
points).
On the other hand, there is a positive relationship between having a
bachelor ’s degree and monthly expenditures (see also Figure[REDACTED PHONE] ). In other
words, the level of monthly expenses, including coffee consumption, is not
linked to whether people have obtained secondary education but is related
to having a university degree. The higher the percentage, the more
expenses they are likely to make. As a result, the coffee shop owner would
Figure[REDACTED PHONE] Scatter plot matrix wizard. Selection of histograms[REDACTED PHONE] Exercise 2.2 Bivariate Analysis

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
probably prefer to target an area with a high percentage of people possess-
ing a bachelor ’s degree, as it seems that they are willing to spend more in
this type of market. For additional quanti ﬁcation of the effects of education
and other variables on expenses, see Labs 6 and7.
Section B GeoDa
Exercise 2.1 ESDA Tools: Mapping and Analyzing the Distribution of Income
In this exercise, we (a) map Income (yearly average income of the residents
living in each postcode), (b) plot related graphs and (c) apply descriptive
statistics to analyze income ’s spatial and statistical distribution.
GeoDa Tools to be used: Choropleth map, Histogram, Boxplots,
Custom breaks
Figure[REDACTED PHONE] Scatter plot matrix among Expenses, University andSecondaryE .[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
ACTION: Create choropleth map of income
Navigate to the location you have stored the book dataset and
click the Lab2_SimpleESDA_GeoDa.gda
RC over a polygon on the Map >Change Current Map Type >Custom
Breaks >Create New Custom Breaks >Income >OK>Leave New
Categories Title as default and press OK (see Figures[REDACTED PHONE] ,
[REDACTED PHONE] ,[REDACTED PHONE] as well)
Breaks = User Defined
Categories = 4
Write the following values in the break fields: break 1 = [REDACTED PHONE]
/ break 2 = [REDACTED PHONE] / break 3 = [REDACTED PHONE] >Enter >Close the dialog
box >Save
Figure[REDACTED PHONE] Choropleth map of income[REDACTED PHONE] Exercise 2.1 ESDA Tools

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Interpreting results: SeeSection A, Exercise 2.1 .
ACTION: Create histogram
Main Menu (see Figure[REDACTED PHONE] )>Explore >Histogram >Variable
Settings = Income >OK
RC over the histogram (see Figure[REDACTED PHONE] )>Choose Intervals >
Intervals = 10 >OK
RC over the histogram >View >Display Statistics
Select a bin in the graph and examine the highlighted post codes
in the map (see Figure[REDACTED PHONE] ).
Save
Figure[REDACTED PHONE] Histogram of income[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Interpreting results : See Section A, Exercise 2.1 .
ACTION: Create box plot
Main Menu >Explore >Box Plot >First Variable (X) = Income >
OK
Brush the extreme outlier in the graph (marked with O) and
locate the postcode in the map (see Figure[REDACTED PHONE] ).
Save
Interpreting results: SeeSection A, Exercise 2.1 .
Figure[REDACTED PHONE] Brushing capabilities of ESDA tools[REDACTED PHONE] Exercise 2.1 ESDA Tools

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
ACTION: Calculate and map income z-scores. Identify income outliers
Main Menu >Table >Calculator >TAB = Univariate
Click Add Variable
Name = IncZScore (see Figure[REDACTED PHONE] )
Type = real
Insert before = after last variable
Length = 5
Decimals = 3 (stores the number of digits to the right of the
decimal place)
Figure[REDACTED PHONE] Boxplot of income along with statistics. Brushing the extreme outlier in
the boxplot highlights the corresponding postcode in the map[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
AddSelect INCZSCORE (see Figure[REDACTED PHONE] )
Operator = STANDARDIZED (Z)
Variable/Constant = Income
Apply >Close
RC over a polygon on the Map >Change Current Map Type >Custom
Breaks >Create New Custom Breaks >First Variable (X) =
INCZSCORE >OK>New Customer Categories Title: IncZScore >OK
Assoc.Var = INCZSCORE
Breaks = User Defined
Categories = 5
Write the following values in the break fields: break 1 = -2.5 >break 2
=- 1 >break 3 = 1 >break 4 = 2.5. Close the dialog box >Save (see
Figure[REDACTED PHONE] )
Figure[REDACTED PHONE] Add variable dialog box.
Figure[REDACTED PHONE] Calculating z-score through calculator[REDACTED PHONE] Exercise 2.1 ESDA Tools

--- Page[REDACTED PHONE] ---
Exercise 2.1 (cont. )
Interpreting results: SeeSection A, Exercise 2.1 .
GeoDa TIP: To solve the exercise again remove the ﬁledINCZSCORE from
CityGeoDa.shp stored in I:\BookLabs\Lab2\GeoDa . Open the table by
selecting the table icon (see Figure[REDACTED PHONE] ) and then RC INCZSCORE >Delete
Variable >INCZSCORE >Delete >Close
Exercise 2.2 Bivariate Analysis: Analyzing Expenditures by Educational Attainment
SeeSection A, Exercise 2.2 , for introduction of this exercise.
GeoDa Tools to be used: Scatter plot ,Scatter plot matrix
ACTION: Create scatter plot
Navigate to the location you have stored the book dataset and
click Lab2_SimpleESDA_GeoDa.gda
Main Menu >Explore >Scatter plot >
Figure[REDACTED PHONE] Mapping and rendering z-scores of income. Postcodes with Z-scores higher
than 2.5 (or smaller than /C02.5) are outliers[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
Figure[REDACTED PHONE] Scatter plot dialog box.
Figure[REDACTED PHONE] Scatter plot for University and Expenses along with a regression line
superimposed, and related statistics[REDACTED PHONE] Exercise 2.2 Bivariate Analysis

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
Independent Var X = University (see Figure[REDACTED PHONE] )
Dependent Var Y = Expenses
OK
You should brush the point in the upper right (high education –high
expenditures) of the scatter plot to highlight the related postcodes in the
map (see Figure[REDACTED PHONE] ). You can also brush subregions in the map and see if
the slope of the regression changes. A systematic change in regression
geometry in relation to neighboring areas indicates spatial heterogeneity.
For details on regression statistics, see Chapter 6 .
Interpreting results: SeeSection A, Exercise 2.2 .
ACTION: Create scatter plot matrix
Main Menu >Explore >Scatter plot Matrix >
Variables = Expenses >Click on the arrow pointing to the right (see
Figure[REDACTED PHONE] )
Variables = University >Click on the arrow pointing to the right
Variables = SecondaryE >Click on the arrow pointing to the right
Close the dialog box
Figure[REDACTED PHONE] Create scatter plot matrix graph[REDACTED PHONE] Exploratory Spatial Data Analysis Tools and Statistics

--- Page[REDACTED PHONE] ---
Exercise 2.2 (cont. )
You can save the graph by right clicking and then select Save Image As .
Interpreting results: See Section A, Exercise 2.2 .
Figure[REDACTED PHONE] Scatter plot matrix among Expenses, University, andSecondaryE .
Regression lines and slope values with signi ﬁcance indicated by one asterisk
(*,p<[REDACTED PHONE]) or two asterisks (**, p<[REDACTED PHONE]) are also presented (see also Figure[REDACTED PHONE],
which is a subplot of Figure[REDACTED PHONE]).[REDACTED PHONE] Exercise 2.2 Bivariate Analysis

--- Page[REDACTED PHONE] ---
3 Analyzing Geographic Distributions
and Point Patterns
THEORY
Learning Objectives
This chapter deals with
/C15 Calculating basic statistics for analyzing geographic distributions includ-
ing mean center, median center, central feature, standard distance and
standard deviational ellipse (centrographics)
/C15 Explaining how these metrics can be used to describe spatial arrange-
ments of different sets of point patterns
/C15 Deﬁning locational and spatial outliers
/C15 Introducing the notions of complete spatial randomness, ﬁrst-order
effects and second-order effects
/C15 Analyzing point patterns through average nearest neighbor analysis
/C15 Ripley ’sKfunction
/C15 Kernel density estimation
/C15 Randomness and the concept of spatial process in creating point
patterns
After a thorough study of the theory and lab sections, you will be able to
/C15 Use spatial statistics to describe the distribution of point patterns
/C15 Identify locational and spatial outliers
/C15 Use statistical tools and tests to identify if a spatial point pattern is
random, clustered or dispersed
/C15 Use Ripley ’sKand Lfunctions to de ﬁne the appropriate scale of
analysis
/C15 Use kernel density functions to produce smooth surfaces of points ’inten-
sity over space
/C15 Apply centrographics, conduct point pattern analysis, apply kernel
density estimator and trace locational outliers through ArcGIS[REDACTED PHONE]

--- Page[REDACTED PHONE] ---
3.1 Analyzing Geographic Distributions: Centrography
Centrographic statistics are tools use d to analyze geographic distributions
by measuring the center, dispersion and directional trend of a spatial
arrangement. The centrographic statistics in most common use are the
mean center, median center, central feature, standard distance and stand-
ard deviational ellipse. Centrograp hic statistics are calculated based
on the location of each feature, which is their major difference with des-
criptive statistics, which concern only the nonspatial attributes of spatial
features.
[REDACTED PHONE] Mean Center
Deﬁnition
Mean center is the geographic center for a set of spatial features. It is
a measure of central tendency and is calculated as the average of the xiand
yivalues of the centroids of the spatial features ([3.1]; see Figure 3.1A ).
/C22X¼Pn
i¼1xi
n,/C22Y¼Pn
i¼1yi
n(3.1)
nis the number of spatial objects (e.g., points or polygons)
xi,yiare the coordinates of the i-th spatial object (centroid in case of
polygons)
/C22X,/C22Yare the coordinates of the mean center
The mean center can be calculated considering weights ( 3.2). For example, we
can calculate the mean center of cities based on their population or income
(see Figure 3.1B ).
/C22X¼Pn
i¼1wixiPn
i¼1wi,/C22Y¼Pn
i¼1wiyiPn
i¼1wi(3.2)
where
wiis the weight (e.g., income or population) of the i-th spatial object
Why Use
The mean center is used to identify the geographical center of a distribution
while the weighted mean center is used to identify the weighted
geographical center of a distribution. Mean and weighted centers can be
used to compare between the distributions of different types of features
(e.g., crime distribution to police station distribution) or the distributions
at different time stamps (e.g., crime during the day to crime during
the night).[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
Interpretation
The mean center offers little information if used on its own for a single
geographical distribution. It is used mostly to compare more than one geo-
graphic distribution of different phenomena, or in time series data to trace
trends in spatial shifts (see Figure 3.2A and B). It should be noted that
different geographic distributions might have similar mean centers (see
Figure 3.2C ). A weighted mean center identi ﬁes the location of a weighted
attribute. In this respect, it is more descriptive (compared to the mean
center), as it indicates where the values of a speci ﬁc attribute are higher
(see Figure 3.1B ).
Discussion and Practical Guidelines
The mean center will be greatly distorted where there are locational outliers.
Locational outliers should thus be traced before the mean center is calculated.
The mean center can be applied to many geographically related problems,
such as crime analysis or business geomarketing analysis. Let us consider two
examples.
/C15 Crime analysis: Police can identify the central area of high criminality by
using a point layer of robberies and calculating the mean center (see
Figure 3.1A ). This would not be very informative, however, as the mean
center would probably lie close to the centroid of the case study area. If
Mean center Weighted mean center
Figure 3.1 (A) Mean center of points ’spatial distribution (e.g., cities). The mean center is
not an existing point in the dataset. (B) Weighted mean center of a spatial distribution of
cities using population as weight. The weighted mean center has been shifted
downward relative to the non-weighted mean center (see Figure 3.1A ) because the
cities in the south are more highly populated[REDACTED PHONE].1 Analyzing Geographic Distributions: Centrography
                     

--- Page[REDACTED PHONE] ---
data are available for multiple time stamps, the mean center calculation
could reveal a shift toward speci ﬁc directions and thus identify a crime
location trend (see Figure 3.2A ). Police may further seek to determine
why the trend exists and allocate patrols accordingly.
/C15 Geomarketing analysis: Suppose we analyze two different point sets for
ac i t y .T h e ﬁrst refers to the location of banks (circles) and the second to
the location of hotels (squares) (see Figure 3.2B andC). The calculation
of the mean center of each point set will probably show different
results. We expect that banks will probably be located in the central
business district, while hotels closer to the historic center. In this case,
the two mean centers will probably lie far apart from each other (see
Figure 3.2B ). Still, the two mean centers might lie close together if the
central business district lies close or engulfs the historic city center (see
Figure 3.2C ). The mean center is not very informative on its own, but it
provides the base upon which more advanced spatial statistics can
be built.
Mean center
Mean center Mean center
Figure 3.2 (A) Mean center shift for time series data. (B) Different distributions with
different mean centers; this is more informative. (C) Different distributions with the same
mean center[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
[REDACTED PHONE] Median Center
Deﬁnition
Median center is a point that minimizes the travel cost (e.g., distance) from
the point itself to all other points (centroids in the case of polygons) in the
dataset (see Figure 3.3 ). It is a measure of central tendency, calculated as
shown in ( 3.3):
minimizeXn
i¼1di (3.3)
where
nis the number of spatial objects (e.g., points or polygons)
diis the distance of the i-th object to the potential median center
It can also be used with weights (e.g., population, traf ﬁc load) calculated
as (3.4):
minimizeXn
i¼1widi (3.4)
where
wiis the weight of the i-th object. Weights can be positive or negative
reﬂecting the pulling or pushing effects of the events on the location of
the median center (Lee & Wong[REDACTED PHONE] p. 42).
Figure 3.3 (A) Median center. (B) Weighted median center, when using population as
weight, shifted to the right[REDACTED PHONE].1 Analyzing Geographic Distributions: Centrography
                     

--- Page[REDACTED PHONE] ---
Why Use
The median center is a measure of the central tendency of spatial data and can
be used to ﬁnd the location that minimizes total travel cost (or weighted cost in
the case of a weighted median center).
Interpretation
The median center is a new location and is not necessarily one of the points
that exist in the layer. The median center is not as prone to spatial outliers as
the mean center is. When there are many spatial objects, the median center is
more suitable for spatial analysis. The median center can be also calculated
based on weights. For example, to locate a new hospital, we could calculate
the median center of postcodes based on their population. The median center
will be the location where the total travel cost (total distance) of the population
in each postcode to the median center is minimized.
Discussion and Practical Guidelines
There is no direct solution for ﬁnding the median center of a spatial dataset.
Aﬁnal solution can be derived only by approximation. The iterative algorithm
suggested by Kulin & Kuenne ( [REDACTED PHONE] ) is a common method used to ﬁnd the
median center (Burt et al. [REDACTED PHONE] ). The algorithm searches for the solution/
location that minimizes the total Euclidean distance to all available points. If
more than one location minimizes this total cost, the algorithm will calculate
only one. Examples related to crime and geomarketing analyses include the
following:
/C15 Crime analysis: By calculating the median center of crime, police can
locate the point that minimizes police vehicles ’total travel cost to the
most dangerous (high-crime) areas.
/C15 Geomarketing analysis: Locating a new shop in a way that minimizes the
total distance to potential customers in nearby areas.
[REDACTED PHONE] Central Feature
Deﬁnition
Central feature is the object with the minimum total distance to all other
features (see Figure 3.4 ). It is a measure of central tendency. An exhaustive
simple algorithm is used to calculate the total distance of a potential central
feature to all other features. The one that minimizes the total distance is the
central feature ( 3.5):
find object j the minimizesXni¼1dji (3.5)
where
nis the number of spatial objects (e.g., points or polygons)[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
jis one of the nobjects selected to be the central feature in an iteration of
the algorithm
dijis the distance of the i-th object to the potential central feature j. The
distance of j-th object to itself is zero. Euclidean and Manhattan distance
can be used.
It can also be used when weights are selected –e.g., population, traf ﬁc
load –(3.6):
find object j the minimizesXn
i¼1widji (3.6)
where
wijis the weight of the i-th object to the potential j-th central feature
Why Use
To identify which spatial object from those existing in the dataset is the most
centrally located.
Interpretation
The central feature is usually located near the center of the distribution,
while the weighted central feature can be located far from the center. In this
case, the weighted central feature acts as an attraction pole of a large
magnitude.
Central featureWeighted central feature
Figure 3.4 (A) The central feature is the existing feature that minimizes total distance.
(B) Weighted central feature is the existing feature that minimizes the total
distance on some weights (e.g., city population). The points ’locations are the same
in both graphs[REDACTED PHONE].1 Analyzing Geographic Distributions: Centrography
                     

--- Page[REDACTED PHONE] ---
Discussion and Practical Guidelines
The difference with the median center is that a central feature is an existing
object of the database, while the median center is a new point. It can be used if
we have to select a speci ﬁc object from the database rather than a new one –for
example, to ﬁnd which postcode is the central feature in a database. Examples
related to crime and geomarketing analyses include the following:
/C15 Crime analysis: By calculating the central feature, police can locate which
one of the police stations around the city is most centrally located (i.e., a
point layer of police stations).
/C15 Geomarketing analysis: To identify the most centrally located postcode
when locating a new shop. Using population as the weight, we could
locate the central feature that minimizes the distance between the central
postcode and the weighted population.
[REDACTED PHONE] Standard Distance
Deﬁnition
Standard distance is a measure of dispersion (spread) that expresses
the compactness of a set of spatial objects. It is represented by a circle the
radius of which equals the standard distance, centered on the mean center of
the distribution (see Figure 3.5 ). It is calculated as in ( 3.7):
SD¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃPn
i¼1xi/C0/C22X ðÞ2
nþPn
i¼1yi/C0/C22Y ðÞ2
ns
: (3.7)
nis the total number of spatial objects (e.g., points or polygons)
xi,yiare the coordinates of the i-th spatial object
/C22X,/C22Yare the coordinates of the mean center
The weighted standard distance is calculated ( Figure 3.5B ;[3.8]):
SDw ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃPn
i¼1wixi/C0/C22Xw ðÞ2
Pn
i¼1wiþPn
i¼1wiyi/C0/C22Yw ðÞ2
Pn
i¼1wis
(3.8)
where
wiis the weight (the value e.g., income or population) of the i-th spatial
object
/C22Xw,/C22Yware the coordinates of the weighted mean center
Why Use
To assess the dispersion of features around the mean center. It is analogous to
standard deviation in descriptive statistics (Lee & Wong[REDACTED PHONE] p. [REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
Interpretation
When spatial objects are arranged in such a way that more of them are concen-
trated in the center with fewer objects scattered towards the periphery
(following a Rayleigh distribution) and when we do not use weights, approxi-
mately 63% of the spatial objects lie one standard distance from the mean
center, and 98% lie within two standard distances from it (Mitchell[REDACTED PHONE] ). The
greater the standard distance, the more dispersed the spatial objects are
around the mean.
Discussion and Practical Guidelines
Standard distance is a measure of spatial compactness. It is a single measure of
spatial objects ’distribution around the mean center, similar to statistical stand-
ard deviation, which provides a single measure of the values around the statis-
tical mean. Standard distance is more useful when analyzing point layers than
when measuring polygons, as polygons are usually human constructed (e.g.,
administrative boundaries). Examples related to crime and geomarketing ana-
lyses include the following:
/C15 Crime analysis: Standard distance can be used to compare crime distri-
butions across several time stamps (e.g., crime during the day relative to
crime during the night). If the standard distance during the day is smaller
than that at night and is in a different location, this might indicate that
police should expand patrols to wider areas during the night (see
Figure 3.6A ).
/C15 Geomarketing analysis: We could compare the standard distance ofdifferent events, such as the spatial distribution of customers of a mall
Standard distance
Mean center
Weighted mean centerWeighted standard distance
Figure 3.5 (A) Standard distance. (B) Weighted standard distance (weight: city
population).[REDACTED PHONE] Analyzing Geographic Distributions: Centrography
                     

--- Page[REDACTED PHONE] ---
relative to that of the customers a speci ﬁc coffee shop. A different
standard distance would reveal a more local and less-dispersed pattern
for the coffee shop clients compared to those of the mall. An appropriate
marketing policy could be initiated based on these ﬁndings at the local
level (for the coffee shop) and the regional level (for the mall; see
Figure 3.6B ).
[REDACTED PHONE] Standard Deviational Ellipse
Deﬁnition
Standard deviational ellipse is a measure of dispersion (spread) that calculates
standard distance separately in the xandydirections (see Figure 3.7 ), as in ( 3.9
and3.10). The ellipse can also be calculated using the locations in ﬂuenced by
an attribute (weights).
SDx¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃPn
i¼1xi/C0/C22X ðÞ2
ns
(3.9)
SDy¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃPn
i¼1yi/C0/C22Y ðÞ2
ns[REDACTED PHONE])
nis the total number of spatial objects (e.g., points or polygons)
xi,yiare the coordinates of the i-th spatial object
/C22X,/C22Yare the coordinates of the mean center
Standard distance in day
Standard distance in night
Mall customers
Coﬀee shop customers
Figure 3.6 (A) Standard distance for the same phenomenon in two different time stamps.
(B) Standard distance for two different sets of events. Different standard distance
reﬂects the different dispersion patterns of the two clientele groups[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
These two standard distances are orthogonal to each other and de ﬁne the
standard deviational ellipse. The ellipse is centered on the mean center and is
rotated by a particular angle θform north.
Ifx0
i¼xi/C0/C22Xand y0
i¼yi/C0/C22Yare the deviations of the points form the mean
center, the rotation angle is calculated by ( [REDACTED PHONE]):
tanθ¼Pn
ix02
i/C0Pn
iy02
i/C16/C17
þﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃPn
ix02
i/C0Pn
iy02
i/C0/C12þ4Pn
ix0Pn
iy0/C0/C12q
2Pn
ix0
iPn
iy0
i[REDACTED PHONE])
The angle of rotation θis the angle between north and the major axis. If the
sign of the tangent is positive, then the major axis rotates clockwise from the
north. If the tangent is negative, then it rotates counterclockwise from north
(Lee & Wong[REDACTED PHONE] p. 49).
Why Use
The standard deviational ellipse is used to summarize the compactness and the
directional trend (bias) of a geographic distribution.
Interpretation
The direction of the ellipse reveals a tendency toward speci ﬁc directions that
can be further analyzed in relation to the problem being studied. Further-
more, we can compare the size and direction of two or more ellipses,
reﬂecting different spatial arrangements. When the underlying spatial
arrangement is concentrated in the center with fewer objects lying awayfrom the center, according to a rule of thumb derived from the Rayleigh
Standard deviational ellipse
Mean center
Figure 3.7 Standard deviational ellipse reveals dispersion and directional trend[REDACTED PHONE].1 Analyzing Geographic Distributions: Centrography
                     

--- Page[REDACTED PHONE] ---
distribution for two-dimensional data, a one standard deviational ellipse will
cover approximately 63% of the spatial features, two standard deviations will
contain around 98% percent of the features, and three standard deviations
will cover almost all of the features[REDACTED PHONE]%; Mitchell[REDACTED PHONE] ). For a nonspatial
one-dimensional variable, these p ercentages are 68%, 95% and 99%,
respectively, for the normal distribution and standard deviation statistics.
Discussion and Practical Guidelines
When we calculate a standard deviational ellipse, we obtain (a) an ellipse
centered on the mean center of all objects, (b) two standard distances (the
length of the long and short axes) and (c) the orientation of the ellipse. Axis
rotation is calculated so that the sum of the squares of the distances between
the points and axes is minimized. The standard deviational ellipse is more
descriptive and is more widely used than standard distance. Examples include
the following:
/C15 Crime analysis: A standard deviational ellipse can reveal a direction. If
related to other patterns (e.g., bank or shop locations), it could indicate a
potential association (e.g., shop burglaries arranged around a speci ﬁc
route). Police can then better plan patrols around these areas (see
Figure 3.8A ).
/C15 Geomarketing analysis: Plotting the deviational ellipses calculated on the
locations of customers who buy speci ﬁc products allow us to locate the
areas where people prefer one product over another. Marketing policies
can then be formulated for these locations (see Figure 3.8B ).
[REDACTED PHONE] Locational Outliers and Spatial Outliers
Deﬁnition: Locational Outlier
Alocational outlier is a spatial object that lies far away from its neighbors (see
Figure 3.9 ). As in descriptive statistics, there is no optimal way to de ﬁne a
location outlier. One simple method is to use the same de ﬁnition as that in
descriptive statistics (see Section[REDACTED PHONE] ). Under this de ﬁnition, an object whose
distance to its nearest neighbor exceeds 2.5 deviations from the mean nearest
neighbor average (computed for the entire dataset) is considered the loca-
tional outlier, as in ( [REDACTED PHONE]):
Locational Outlier Distance from nearest neighbor /C21
Average Nearesr Neighbor DistanceðÞ þ2:5Standard deviations[REDACTED PHONE])
Why Use
Locational outliers should be traced, as they tend to distort spatial
statistics outputs. For example, if an outlier exists, the mean center will be158 Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
signi ﬁcantly different from the mean center when the outlier is not included in
the calculations (see Figure 3.9 ). They may also reveal interesting data patterns
(see the Interpretation and Discussion and Practical Guidelines sections that
follow).
Interpretation
An outlier typically indicates that we should either remove it from the dataset
(at least temporarily) or conduct further research to explain its presence. For
example, it may indicate incorrect data entry or a distant location that is
abnormal (such as a distant location in which a virus suddenly appears), which
should be further studied.
Discussion and Practical Guidelines
Following the de ﬁnition given earlier, tracing locational outliers requires
calculating (a) the nearest neighbor distance of each object (thus creating
a distribution of all nearest neighbor distances), (b) the average nearest
neighbor distance of all spatial objects, and (c) the standard deviation of the
nearest neighbor distances (this is not the standard distance, as discussed in
Section[REDACTED PHONE] ).
For polygon features, we can use the area instead of the distance to trace
location outliers. We consider locational outliers as those polygons the area of
Standard deviational ellipseProduct B: Standard deviational
ellipse
Product A: Standard deviational
ellipse
Figure 3.8 (A) Standard deviational ellipse of crimes. A northeast-to-southwest direction
is identi ﬁed. (B) Standard deviational ellipse for two different sets of events. The
circles are the locations where customers buy product A, and the squares are the
locations where customers buy product B. The directions and dispersion are different,
showing that product A penetrates in areas different from those of product B[REDACTED PHONE].1 Analyzing Geographic Distributions: Centrography
                     

--- Page[REDACTED PHONE] ---
which is considerably larger than that of the rest of the objects (again 2.5 or
3 standard deviations away from the mean). In this case, we do not compare
nearest neighbors, but we examine the dataset as a whole. Area is related to
distance; the larger the area, the more likely it is that the nearest neighbor
distance will increase. The calculation of locational outliers using area is not
always thorough, as island polygons with small areas may be outliers but will
not be classi ﬁed as such due to their small area value.
Tracing the locational outliers of a polygon shape is essential (especially
when calculating spatial autocorrelation; see the next chapter). For example,
administrative boundaries, postcodes and school districts are usually human-
made. These kinds of spatial data typically have small polygons in the center
of a city and larger ones in the outskirts. Thus, the probability of having a
locational outlier is high. Spatial statistics tools (in most software packages)
estimate locational outliers based on the distances among objects; it is thus
better to determine if a polygon is an outlier using distance rather than area.
For this reason, polygons should be handled as centroids, and the existence of
outliers should be veri ﬁed based on the distance and distribution of their
centroids.
Potential case studies include the following:
/C15 Crime analysis: Identifying if locational outliers in crime incidents exist
may reveal abnormal, unexpected behavior that might need additional
surveillance. For example, if a credit card that is usually used in speci ﬁc
Mean centerMean center
Figure 3.9 (A) Mean center without any locational outlier. (B) Mean center when the
outlier is included in the calculation. The mean center is shifted toward the direction of
the locational outlier, distorting the real center of the distribution[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
locations of a city (e.g., shops, restaurants, a house) is also used the
same day at a location miles away, that would be a strong indication
of fraud.
/C15 Geomarketing analysis: Locational outliers might not be desirable. For
example, if a bank wants to locate a new branch, the existence of locational
outliers might place the location point in an area that will not be convenient
for most of its clientele. The temporary exclusion of the locational outlier
will allow for a better branch location. The outlier will then be included in
the dataset for other analyses. In other words, when we exclude an outlier,
we usually exclude it only for certain spatial statistical procedures, but we
include it again because it might be helpful for another type of analysis.
Deﬁnition: Spatial Outlier
Aspatial outlier is a spatial entity whose nonspatial attributes are considerably
different from the nonspatial attributes of its neighbors (see Figure[REDACTED PHONE] ).
Why Use
Spatial outliers reveal anomalies in a spatial dataset that should be further
examined.
Interpretation
Spatial outliers can be interpreted differently depending on the problem at
hand (see examples and case studies later in this chapter). The basic conclusion
we draw when spatial outliers exist is that some locations have attribute values
that differ considerably from those of their neighbors. This means that some
processes run across space, inferring heterogeneity. Various spatial analysis
methods can be used to analyze these processes (see Section 3.2 and
Chapter 4 ).
Discussion and Practical Guidelines
A spatial outlier should not be confused with a locational outlier. To detect
locational outliers, we analyze only the distance of a spatial entity to its
neighbors. No other attribute analysis takes place. To detect spatial outliers,
we study if an attribute value deviates signi ﬁcantly from the attribute values of
the neighboring entities. Thus, a spatial outlier does not need to be a locational
outlier as well. Additionally, a speci ﬁc entity may be labeled as a spatial outlier
only for a single attribute, while other attribute values might not deviate from
the corresponding attribute values of other neighboring entities. Finally, a
spatial outlier is not necessarily a global outlier as well, as the spatial outlier
is always de ﬁned inside a prede ﬁned neighborhood.
Let us consider an example. Suppose we study the percentage of ﬂu occur-
rence within the postcodes of a city (i.e., the percent of people who got the ﬂu
in the last month). If we trace a speci ﬁc postcode with a ﬂu occurrence161 3.1 Analyzing Geographic Distributions: Centrography
                     

--- Page[REDACTED PHONE] ---
percentage considerably higher than that of adjacent postcodes, we might
label this postcode as a spatial outlier (this might trigger an emergency alert
only for this area; Grekousis & Fotis[REDACTED PHONE], Grekousis & Liu[REDACTED PHONE]). This postcode
may be centrally located. Then, although it is a spatial outlier, it would not be a
locational outlier (neighboring postcodes are in close proximity). If we examine
the income attribute of this postcode in relation to the income attributes of the
neighboring postcodes, we might ﬁnd no signi ﬁcant differences in values. This
postcode would then be a spatial outlier only for the ﬂu percentage occurrence
attribute and not for the income attribute. A spatial outlier does not necessarily
mean that the value of the attribute is a global outlier as well. For example,
there might be additional postcodes with similarly high ﬂu occurrence percent-
ages but clustered in another area of the city. It is a spatial outlier because it is
different (for some attribute) within its neighborhood (so we have to de ﬁne a
neighborhood to trace the spatial outliers). By tracing spatial outliers, we can
detect anomalies in many diverse types of data, including environmental,
surveillance, health and ﬁnancial data.
Depicting them in a 3-D graph is a rapid way of detecting spatial outliers.
Objects are located on the X-Yplain by their geographic coordinates. The
value of the attribute for each spatial object is depicted on the Z axis. Those
objects that are signi ﬁcantly higher or lower than their neighbors might be
spatial outliers. This method is only graphical. To detect outliers based on
Figure[REDACTED PHONE] (A) Set of events at a set of locations. Each event stands for the location of a
house in a neighborhood. (B) If we analyze an attribute (e.g., the size of each house),
then a house whose size is considerably different from that of its neighbors is a spatial
outlier. For example, the house (event pointed at with an arrow) is a spatial outlier, as its
size value is either too large or too small compared to that of its neighbors. As can be
observed, a spatial outlier does not have to lie far away from other points[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
numerical results, we can use the Local Moran ’sIindex ( Section[REDACTED PHONE] ) or the
optimized outlier analysis ( Section[REDACTED PHONE] ), which are explained in the next
chapter.
Spatial outlier detection can be used for the following:
/C15 Crime analysis: One could identify if a postcode has a high crime rate
while all the neighboring ones have low rates. This might indicate a
ghetto.
/C15 Geomarketing analysis: Identifying a speci ﬁc postcode where people buy
a certain product considerably less often than in adjacent ones might
indicate where access to the product is not easy or where a better
marketing campaign is required.
3.2 Analyzing Spatial Patterns: Point Pattern Analysis
Deﬁnitions
Spatial point pattern Sis a set of locations S={ s 1,s2,s3,...sn}in a prede ﬁned
region, R, where nevents have been recorded (Gatrell et al. [REDACTED PHONE] ; see
Figure[REDACTED PHONE] ). Put simply, a point pattern consists of a set of events at a set of
locations, where each event represents a single instance of the phenomenon of
interest (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Event is the occurrence of a phenomenon, a state, or an observation at a
particular location (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Spatiotemporal point pattern is a spatial pattern of events that evolves
over time. In spatiotemporal point patterns, multiple sets of events occur
Figure[REDACTED PHONE] Spatial point pattern: Events arranged in a geographic space create a
spatial point pattern. A map of events (point pattern) is the realization of a spatial
process[REDACTED PHONE].2 Analyzing Spatial Patterns: Point Pattern Analysis
                     

--- Page[REDACTED PHONE] ---
diachronically. These events include several links among places based on
physical connections, functional interactions or processes that relate one place
to another (Lu & Thill[REDACTED PHONE] ).
Why Use
Point pattern analysis is used mainly to (a) describe the events variation across
space and (b) identify relationships among the events by the de ﬁnition of the
spatial process that triggers their arrangement.
Discussion and Practical Guidelines
Sets of point events are widespread in geographical analysis. The locations of
shops, crimes, accidents or customers are just a few examples of sets of events
that create point patterns in space. Questions can be asked when similar data
are available, such as “Where do most of our customers live? ”“Where is traf ﬁc
accident density highest? ”and “Are there any crime hot spots in a study area? ”
Although mapping events provides an initial assessment of their spatial
arrangement, more advanced techniques should be used to provide a quanti-
tative evaluation of their geographical distribution. The rationale behind such
analysis is deeper. By describing the spatial distribution of events, we attempt
to identify the spatial process leading to this formation. The core idea is
that there is a spatial process that generates a speci ﬁc event ’s arrangement
(spatial point pattern) and that this formation is not just the result of a random
procedure. In fact, this spatial process is worth further study as the realization
of the spatial process in space is the point pattern. Identifying the process
allows us to apply the appropriate measures if we want to change the observed
pattern (see Figure[REDACTED PHONE] ).
The term “event ”is commonly used in spatial analysis to distinguish
the location of an observation from any other arbitrary location within the
study area. The term “point ”is also often used to describe a point pattern.
Events are described by their coordinates si(xi,yi) and a set of attributes
related to the studied phenomenon. The events should be a complete
enumeration of the spatial entities being studied and not just a sample
(O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). In other words, all available events should
be used in point pattern analysis. Although point pattern analysis techniques
can be used for samples, the results are very sensitive to missing events.
Most point pattern analysis techniques deal only with the location of the
events and not with other attributes they might carry (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). The analysis of attribute value patterns is usually conducted
using spatial autocorrelation methods, spatial clustering or spatial regres-
sion, as explained in the following chapters (or using geostatistics for ﬁeld
data). The analysis of point patterns may start with centrographic measures,as discussed in Section 3.1 , but more sophisticated measures are required, as
discussed next[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
[REDACTED PHONE] De ﬁnitions: Spatial Process, Complete Spatial Randomness, First- and
Second-Order Effects
Deﬁnition
Spatial process is a description of how a spatial pattern can be generated.
There are three main types of spatial process (Oyana & Margai[REDACTED PHONE] p. [REDACTED PHONE]):
/C15 Complete spatial randomness process, also called independent
random process (IRP), is a process whereby spatial objects (or their
attribute values) are scattered over the geographical space based on
two principles (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]):
(a) There is an equal probability of event occurrence at any location in
the study region (also called ﬁrst-order stationary).
(b) The location of an event is independent of the locations of other
events (also called second-order stationary).
Create Apply
Identify
processDescribe
Figure[REDACTED PHONE] A spatial process (not yet known) creates a point pattern that can be described
through point pattern analysis. When the spatial process is de ﬁned (through the point
pattern analysis), measures can be applied to modify the spatial process if needed. For
example, the spatial analysis of disease will probably reveal a clustered pattern through
an aggregating process. If we locate hot spots, we may apply measures (e.g.,
vaccination) to speci ﬁc geographical areas to prevent an expansion of the problem[REDACTED PHONE].2 Analyzing Spatial Patterns: Point Pattern Analysis
                     

--- Page[REDACTED PHONE] ---
/C15 Competitive process is a process that leads events to be arranged as far
away from each other as possible. The concept is that each event should
be located at even, large distance from each neighboring event so that it
maximizes its zone of in ﬂuence/impact. In this respect, events tend to be
uniformly distributed.
/C15 Aggregating process is a process where events tend to cluster as a
result of some pulling action.
There are three main types of spatial arrangement/pattern associated with the
above spatial processes (Oyana & Margai[REDACTED PHONE] p. [REDACTED PHONE]; see Figure[REDACTED PHONE] ):
/C15 Random spatial pattern: In this type of arrangement, events are ran-
domly scattered all over the study area (see Figure 3.13B ). This pattern
has a moderated variation and is similar to a Poisson distribution (Oyana
& Margai[REDACTED PHONE] p. [REDACTED PHONE]). It is the result of an independent random process.
/C15 Dispersed: The events are located uniformly around the study area. This
is the result of a competitive process, creating a pattern with no or little
variation (see Figure 3.13A ). Events are located such that they are further
away from their neighbors. For instance, the locations of bank branches
are more likely to form a dispersed pattern, as there is no reason to have
branches of the same bank located near each other.
/C15 Clustered: The events create clusters in some parts of the study area, and
the pattern has a large variation (see Figure 3.13C ). This is the result of an
aggregating process. For example, most hotel locations in a city tend to
cluster around historical landmarks or major transportation hubs.
First-order spatial variation effect occurs when the values or locations of
spatial objects vary from place to place due to a local effect of space (the equal
probability assumption of IRP no longer holds; O ’Sullivan & Unwin[REDACTED PHONE] p. 29).
For example, stroke event locations may vary from place to place inside a city
Figure[REDACTED PHONE] (A) In a dispersed pattern, events are scattered in a nearly uniform way. (B) In a
random spatial pattern, we cannot identify any clusters or dispersion. (C) In a clustered
spatial pattern, clusters will be evident in some parts of the region[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
but are expected to be concentrated in places where more of the population is
located or in places closer to industrial zones. In other words, events do not
occur with the same probability everywhere. This type of spatial effect on the
variation of a value or on the location of events is called the ﬁrst-order effect.
First-order effects are mostly associated with density/intensity measures
(O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Second-order spatial variation effect occurs when there is interaction
among nearby locations. Here, the location, or the value of an observation, is
highly in ﬂuenced by a nearby location or the value of a neighboring observa-
tion (the independence assumption of IRP no longer holds). For example,
immigrants new to a city are more likely to reside in a neighborhood where
people of the same ethnicity live, as they would probably feel more comfort-
able there. In this case, there is a strong local interaction that attracts new-
comers; this is a second-order effect of space. Second-order effects are mostly
associated with distance measures (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Stationarity is the state in which (a) every location in space has equal
probability for an event placement through a spatial process and (b) there is
no interaction among events (i.e., independence; O ’Sullivan & Unwin[REDACTED PHONE]
p. 65).
First-order stationary process is a process in which there is no variation in
its intensity across space.
Second-order stationary process is a process in which there is no inter-
action among events.
Intensity of a spatial process is the probability that each small geographical
area will receive an event. This probability is the same for every location in a
stationary spatial process.
Anisotropic process is a process in which the intensity varies according to
speci ﬁc directions (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Isotropic process, on the other hand, is a spatial process in which directional
effects do not exist.
Spatial heterogeneity refers to the non-stationarity of a geographic process
(Dem šar et al. [REDACTED PHONE] ).
[REDACTED PHONE] Spatial Process
Each spatial pattern in a particular space and at a speci ﬁc moment in time is the
result of a process occurring within a wider space and time. For a given set of
events, it is interesting from the spatial analysis perspective to analyze (a) the
observed spatial pattern and (b) the process that created this arrangement (if
any). We can summarize the above by posing the following research question:
“Can the observed point pattern be the result of a hypothesized spatial
process? ”
A spatial process can be either (a) deterministic, when inputs and outputs are
certain over time, or (b) stochastic, when the outcome is subject to variation167 3.2 Analyzing Spatial Patterns: Point Pattern Analysis
                     

--- Page[REDACTED PHONE] ---
and cannot be de ﬁned precisely by a mathematical function. Typically, a spatial
pattern is the potential realization of some stochastic process. The most
common stochastic process is the independent random process (IRP), also
known as complete spatial randomness.
First- and second-order effects express the variations in intensity of a spatial
process across the study area. A spatial process is ﬁrst-order stationary if there
is no variation in its intensity over space, meaning that there is an equal
probability of an event occurring in the study region. A second-order stationary
process occurs when there is no interaction among events. Complete spatial
randomness is both ﬁrst- and second-order stationary. Simply put, in complete
spatial randomness, there are no ﬁrst- or second-order effects. When ﬁrst- and/
or second-order effects exist, the chances of an event occurring at some
speci ﬁc location vary (due to dependence and interactions with the neighboring
events); thus, the process is no longer stationary (O ’Sullivan & Unwin[REDACTED PHONE] p. 65).
In zero spatial autocorrelation, no ﬁrst- or second-order effects exist, and
location or value variation is random (no patterns are detected). First- and
second-order effects are not always easily distinguished, but tracing their
existence is extremely important for the accurate modeling of spatial data
(O’Sullivan & Unwin[REDACTED PHONE] p. 29)
Complete spatial randomness refers to (a) the speci ﬁc locations in which
objects are arranged and (b) the spatial arrangement of the spatial objects ’
attribute values. In the ﬁrst case, the techniques used to detect complete
spatial randomness only analyze the location of the spatial objects. Most of
these techniques are point pattern analysis methods and are used for point
features. In the second case, the focus is on how the values of one or more of
the variables for ﬁxed locations are spatially arranged. This approach deals
mostly with aerial data (polygon features). Polygons are usually arti ﬁcially
created to express certain boundaries. Therefore, instead of analyzing their
distribution as polygons, which would offer no valuable results, we study how
their attribute values are arranged in space. Spatial autocorrelation methods
are used in this case and will be explained in detail in Chapter 4 .
However, what is randomness in a spatia l context? In nonspatial inferential
statistics, we use representative samples to make inferences for the entire
population. To do so, we set a null hypothesis and reject it or fail to reject it
using a statistical test. In a spatial context, the null hypothesis used is (typic-
ally) complete spatial randomness. Under this hypothesis, the observed
spatial pattern is the result of a random process. This means that the prob-
ability of ﬁnding clusters in our data is minimal, and there is no spatial
autocorrelation. To better understand t he spatial random process and spatial
random pattern, consider the following example. Imagine that you have[REDACTED PHONE] square paper cards. Each card has a single color: red, green or blue. If
we throw (rearrange) all the cards onto the ﬂo o r ,w eh a v eo u r ﬁrst spatial
arrangement (spatial pattern). Is this spatial pattern of colors random, or is itclustered? It is most likely that cards will be scattered, and colors will be168 Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
mixed. The probability that most of the red cards will be clustered together
after our ﬁrst throw is almost zero. If we repeat the throw 1,[REDACTED PHONE] times, we
might identify some clusters of red cards in speci ﬁc regions occurring, say, in
10 out of the 1,[REDACTED PHONE] throws. This shows that the probability of obtaining a
cluster of reds is 1%. In other words, th e random process (throwing cards on
theﬂoor) generated a random spatial arra ngement of objects (spatial pattern)
for[REDACTED PHONE]% of the trials. Thus, the spatial pattern of the objects is random. The
statistical procedure is not as simple as that described here, as it includes
additional calculations and assumptions.
For example, to reject or not reject the null hypothesis of complete spatial
randomness, we use two metrics: a z-score and p-value (see Section[REDACTED PHONE] ). The
z-score is the critical value used to calculate the associated p-value under a
standard normal distribution. It re ﬂects the number of standard deviations at
which an attribute value lies from its mean. The p-value is a probability that
deﬁnes the con ﬁdence level. When the p-value is very small, the probability
that the observed pattern was created randomly is minimal (Mitchell[REDACTED PHONE] ). It
indicates the probability that the observed pattern is the result of a random
process.
A small p-value in spatial statistical language for the complete spatial ran-
domness hypothesis is translated as follows: It is very unlikely that the observed
pattern is the result of a random process, and we can reject the null hypothesis.
There is strong evidence that there is an underlying spatial process at play. This
is precisely what a geographer is interested in: locating and explaining spatial
processes affecting the values of any spatial arrangement or spatial
phenomenon.
If the data exhibit spatial randomness, there is no pattern or underlying
process. Further geographical analysis is thus unnecessary. If the data do not
exhibit spatial randomness, then they do not have the same probability of
occurrence in space ( ﬁrst-order effect presence) and/or the location of an event
depends on the location of other events (second-order effect presence). In this
case, spatial analysis is very important, as space has an effect on event occur-
rences and their attribute values.
To conclude, when ﬁrst-order effects exist, the location is a major determin-
ant of event occurrence. When second-order effects exist, the interactions
among events are largely in ﬂuenced by their distance. The absence of a
second-order stationary process leads to either uniformity or clustering.
3.3 Point Pattern Analysis Methods
There are two main (interrelated) methods of analyzing point patterns, namely
the distance-based methods and the density-based methods.
/C15 Distance-based methods employ the distances among events and describe
second-order effects. Such methods include the nearest neighbor method169 3.3 Point Pattern Analysis Methods
                     

--- Page[REDACTED PHONE] ---
(Clark & Evans[REDACTED PHONE] ;s e e Section[REDACTED PHONE] ), the Gand Fdistance functions,
the Ripley ’sKdistance function (Ripley[REDACTED PHONE] ) and its transformation, the
Lfunction (see Section[REDACTED PHONE] ).
/C15 Density-based methods use the intensity of events occurrence across
space. For this reason, they describe ﬁrst-order effects better. Quadrat
count methods and kernel estimation methods are common density-
based methods. In quadrat count methods, space is divided into a regular
grid (such as a grid of squares or hexagons) of a unitary area. Each unitary
region includes a different number of points due to a spatial process. The
distribution analysis and its correspondence to a spatial pattern are based
on probabilistic and statistical methods. Another, more widely used
method is the kernel density estimation (KDE; see Section[REDACTED PHONE] ). This
method is better than the quadrat method because it provides a local
estimation of the point pattern density at any location of the study area,
not only for the locations where events occur (O ’Sullivan & Unwin[REDACTED PHONE]
p. 85).
Another point pattern analysis approach (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE])
involves a combination of density and distance, thereby creating proximity
polygons. Proximity polygons, such as Delaunay triangulation, and related
constructions, such as the Gabriel graph and the minimum spanning tree of a
point pattern, display interesting measurable properties. For example, analyz-
ing the distribution of the area among polygons that are close shows how
evenly spaced the events in question are. Furthermore, the number of an
event ’s neighbors in Delaunay triangulation and the lengths of the edges are
indicators of how the events are distributed in space.
Finally, another way of analyzing events is by using hot spot analysis. This is
mainly used to identify if clusters of values of a speci ﬁc variable are formed in
space. It is most commonly used with polygon features and is explained in
detail in Section[REDACTED PHONE] . When considering point features with no associated
variables, it is interesting to study the spatial objects ’arrangement in terms of
intensity levels. This is achieved using optimized hot spot analysis (see Section[REDACTED PHONE] for a more in-depth analysis). Strictly speaking, this type of analysis does
not assess the type of point pattern or the spatial process that generated it, but
it does analyze the location of the events in order to identify any spatial
autocorrelation.
[REDACTED PHONE] Nearest Neighbor Analysis
Deﬁnition
Nearest neighbor analysis (also called average nearest neighbor) is a statis-
tical test used to assess the spatial process from which a point pattern has been
generated. It is calculated based on the formula ( [REDACTED PHONE]) (Clark & Evans[REDACTED PHONE] ,
O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]):[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
R¼obsereved mean distance
expected mean distance¼/C22dmin
EdðÞ¼2/C22dminﬃﬃﬃﬃﬃﬃﬃﬃ
n=ap[REDACTED PHONE])
where
/C22dmin¼Pn
i¼1dminsiðÞ
n[REDACTED PHONE])
EdðÞ ¼ 1=2ﬃﬃﬃﬃﬃﬃﬃﬃ
n=ap/C16/C17[REDACTED PHONE])
/C22dminis the average nearest neighbors distance of the observed spatial
pattern
dmin(si) the distance of event sito its nearest neighbor
nis the total number of events
E(d) is the expected value for the mean nearest neighbor distance under
complete spatial randomness
ais the area of the study region, or if not de ﬁned the minimum enclosing
rectangle around all events
Why Use
To decide if the point pattern is random, dispersed or clustered.
Interpretation
A nearest neighbor ratio Rless than 1 indicates a process toward clustering.
A value larger than 1 indicates that the pattern is dispersed due to a competi-
tive process. A value close to 1 re ﬂects a random pattern.
Discussion and Practical Guidelines
In practice, this method identi ﬁes whether the random, dispersed or clus-
tered pattern better describes a given set of events (also called observed
spatial pattern). The statistic tests the null hypothesis that the observed
pattern is random and is generated by c omplete spatial randomness. The
method compares the observed spatial distribution to a random theoretical
one (i.e., Poisson distribution; Oyana & Margai[REDACTED PHONE] p. [REDACTED PHONE]). The test
outputs the observed mean distance, the expected mean distance (through
a homogeneous Poisson point process), the nearest neighbor ratio R,t h e
p-value and the z-score. The expected mean distance is the mean distance
the same number of events would most probably have if they were ran-
domly scattered in the same study area. The p-value is the probability that
the observed point pattern is the result of complete spatial randomness.
The smaller the p-value, the less likely it is that the observed pattern has
been generated by complete spatial randomness. If the p-value is larger171 3.3 Point Pattern Analysis Methods
                     

--- Page[REDACTED PHONE] ---
than the signi ﬁcance level, we cannot reject the null hypothesis (as there is
insuf ﬁc i e n te v i d e n c e ) ,b u tw ec a n n o ta c c e p ti te i t h e r( f o rm o r eo nh o wt o
interpret p-values, see Section[REDACTED PHONE] ). In this case, we have to use other
methods (such as those presented later) to determine the observed
pattern ’st y p e .
A disadvantage of this method is that it summarizes the entire pattern using a
single value. It is therefore used mainly as an indication of whether a clustered
or a dispersed pattern exists and less often to locate where this process takes
place. In addition, this method is highly in ﬂuenced by the study area ’s size.
A large size might indicate clustering for events for which a smaller size would
probably reveal dispersion. Potential case studies include the tracing of clus-
tering in a disease outbreak or identifying if customers of a product are
dispersed throughout a region.
[REDACTED PHONE] Ripley ’sKFunction and the LFunction Transformation
Deﬁnition
Ripley ’sKfunction is a spatial analysis method of analyzing point patterns
based on a distance function (Ripley[REDACTED PHONE] ). The outcome of the function is the
expected number of events inside a radius of d(Oyana & Margai[REDACTED PHONE] p. [REDACTED PHONE]).
It is calculated as a series of incremental distances dcentered on each of the
events in turn ( [REDACTED PHONE]).
KdðÞ ¼a
n2Xn
i¼1Xn
j¼1,j6¼iIddij/C0/C1
wij[REDACTED PHONE])
where
dis an incremental distance value de ﬁned by the user (range of distances
that the function is calculated).
dijis the distance between a target ievent and a neighboring jevent.
nis the total number of events.
ais the area of the region containing all ( n) features.
Idis an indicator factor. Idis 1 if the distance between the iandjevents is less
than d, otherwise it is 0.
wi,jare weights most often calculated as the proportion of the circumference
of a circle with radius dround the target event (Oyana & Margai[REDACTED PHONE]
p. [REDACTED PHONE]). They are used to correct for edge effects.
To better understand how this function works, imagine that, for each ievent
(target), we place a circle of dradius and then count how many events lie inside
this circle. The total count is the value of the indicator factor Id. This procedure
is used for all events and for a range of distances d(e.g., every 10 m starting
from 50 m to[REDACTED PHONE] m).[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
In practice, the original Ripley ’sKfunction produces large values when the
distance increases. Many mathematical transformations of the Ripley ’sKfunc-
tion exist to account for such a problem. A widely used transformation is the L
function (Bailey & Gatrell[REDACTED PHONE] ;O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). This allows for
fast computation and also makes the Lfunction linear under a Poisson distribu-
tion, which enables easier interpretation, as in ( [REDACTED PHONE]) (Oyana & Margai[REDACTED PHONE]
p. [REDACTED PHONE]):
LdðÞ ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
KdðÞ
πr
/C0d[REDACTED PHONE])
Why Use
Kand Lfunctions are used, (a) to decide if the observed point pattern is
random, dispersed or clustered at a speci ﬁc distance or a range of distances
and (b) to identify the distance at which the clustering or dispersion is more
pronounced.
Interpretation
With the Lfunction ( Eq. [REDACTED PHONE] ), the expected value in the case of complete
spatial randomness is equal to the input distance d. This means that for a
random pattern, the Lfunction equals to zero; for a clustered pattern, it gets
a positive value; and for a dispersed pattern, it gets a negative value. To avoid
negative and close-to-zero values, we can omit dfrom Eq. ([REDACTED PHONE] ) and plot L
against dto get a graph like the one in Figure[REDACTED PHONE] (ArcGIS applies this
approach with a slightly different denominator for K). With this Ltransform-
ation, the expected value is again equal to distance d, which can now be used
as a reference line at a 45/C14angle. When the observed value of the Lfunction is
larger than the expected value for a particular distance or range of distances
(see the area above the expected line in Figure[REDACTED PHONE] ), then the distribution is
more clustered than a random distribution.
When the observed value is larger than the upper con ﬁdence envelope
value, then the spatial clustering is statistically signi ﬁcant (see Figure[REDACTED PHONE] ;
Mitchell[REDACTED PHONE] ). On the other hand, when the observed value of the Lfunction
is smaller than the expected (the area below the expected line), then the
distribution is more dispersed than a distribution that would be the result of
complete spatial randomness. Moreover, when the observed value is smaller
than the lower con ﬁdence interval envelope, the spatial dispersion is statistic-
ally signi ﬁcant for this distance (see Figure[REDACTED PHONE] ).
The con ﬁdence envelope is the area in which the expected values would lie
in a random pattern for a speci ﬁc con ﬁdence level (see the Discussion and
Practical Guidelines section). To simulate this process, the points are randomlydistributed many times (e.g., 99 or[REDACTED PHONE] times) using the Monte Carlo approach.
Each one of these times is called a permutation, and the expected value is173 3.3 Point Pattern Analysis Methods
                     

--- Page[REDACTED PHONE] ---
calculated for each permutation, creating a sampling distribution. For[REDACTED PHONE]
permutations, the p-value is typically set to[REDACTED PHONE]; for 99 permutations, it is
set to[REDACTED PHONE]. These re ﬂect 99% and[REDACTED PHONE]% con ﬁdence intervals, respectively.
When a value lies outside the envelope area, the null hypothesis for complete
spatial randomness is rejected.
Discussion and Practical Guidelines
The process used to calculate the Lfunction occurs in the following steps
(Oyana & Margai[REDACTED PHONE] p. [REDACTED PHONE]):
1. Calculate the observed value K(using the Kfunction) of the observed
set of events. The expected value is obtained by placing a buffer
with radius dover a target event and then counting how many events
lie inside this zone. Follow the same process for each single event in the
dataset. Then calculate the average count of events within a range of
distances.
2. Transform the Kfunction estimates to an Lfunction to make it linear.
3. With the Ltransformation, the expected Kvalue is equal to distance d.
4. Determine the con ﬁdence envelope by estimating the minimum and
maximum Lvalues using permutations and the related signi ﬁcance levels
under the null hypothesis of complete spatial randomness.
Figure[REDACTED PHONE] Lfunction transformation of Ripley ’sKfunction over a range of distances[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
5. Plot Lon a graph to reveal if clustering or dispersion is evident at various
distances by comparing the expected to the observed values and
according to whether the observed values lie inside or outside the
conﬁdence envelope.
The Kand Lfunctions are sensitive to the de ﬁnition of the bounding rectangle
in which the features are contained. This raises two issues. First, the study
region area has a direct effect on the density parameter; second, it gives rise
to the edge effect problem (see Section 1.3 ), in which the events that lie in the
edge of the study area tend to have fewer neighbors than those lying in central
locations. In the ﬁrst case, identical arrangements of events are likely to yield
different results for various study area sizes. For example, if the size doubles
but the points are concentrated in the center, this might result in a clustered
pattern. For the exact same event arrangement, however, if the study area is
deﬁned by the minimum enclosing rectangle, the pattern might be dispersed
or random.
On the other hand, a tight study area can cause the edge effect problem.
Events close to the region boundaries tend to have larger nearest neighbor
distances, although they might have neighbors just outside the boundaries
that lie in closer proximity than those lying inside (O ’Sullivan & Unwin[REDACTED PHONE]
p. 95). Various methods have been developed to account for edge effects
(known as edge correction techniques), including guard zones, simulating
outer boundaries, shrinking the study area and Ripley ’s edge correction
formula.
The Monte Carlo simulation approach is a more reliable method of account-
ing for edge effects and the study region area effect. A Monte Carlo procedure
generates and allocates nevents randomly over the study region hundreds or
thousands of times (permutations), creating a sampling distribution. The
observed pattern is then compared with the patterns generated under com-
plete spatial randomness through Monte Carlo simulation. The results of com-
plete spatial randomness are also used to construct an envelope inside of
which the Lvalue of a random spatial pattern is expected to lie. If there is a
statistically signi ﬁcant difference between the observed and the simulated
patterns (those lying outside the envelope), then we may reject the null
hypothesis that the observed pattern is the result of complete spatial random-
ness. Since each permutation is subject to the same edge effects, the obtained
sampling distribution of the Monte Carlo procedure accounts for both edge
and study region area effects simultaneously without the need to apply other
corrections (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Spatial patterns change when studied at multiple distances and spatial
scales, re ﬂecting the existence of particular spatial processes at speci ﬁc dis-
tance ranges (Mitchell[REDACTED PHONE] ). In other words, a spatial pattern might be random
for some range of distances and clustered for others. Ripley ’sKfunction
illustrates how the spatial clustering or dispersion of features ’centroids175 3.3 Point Pattern Analysis Methods
                     

--- Page[REDACTED PHONE] ---
changes when the neighborhood size changes by summarizing the clustering
or dispersion over a range of different distances something helpful for assess-
ing the proper scale of analysis for the problem at hand. This is similar to how
the scale of analysis is de ﬁned in spatial autocorrelation studies (see Chapter 4 ).
When we analyze a point distribution in different distances, we may trace the
distance at which the clustering starts and ends and the distance at which the
dispersion begins and ends. Thus, based on the scope of the research, for a
single-point dataset, we may address different questions relating to the scale
of analysis and the patterns of clustering at various distances (i.e., study how
the clustering of crime changes at different distances). In addition, using this
type of function does not provide a single description for the pattern in
question (as we obtain when we use the average nearest neighbor analysis)
but provides deeper insight, as we describe it according to the distance
desired.
[REDACTED PHONE] Kernel Density Function
Deﬁnition
Kernel density estimation is a nonparametric method that uses kernel func-
tions to create smooth maps of density values, in which the density at each
location indicates the concentration of points within the neighboring area
(high concentrations as peaks, low concentrations as valleys; see
Figure[REDACTED PHONE] ). The kernel density at a speci ﬁc point is estimated using a
kernel function with two or more dimensions. The term “kernel ”in spatial
statistics is used mainly to refer to a window function centered over an area
that systematically moves to each location while calculating the respected
formula. Kernel functions weigh events according to distance so that those
that are closer are weighted more than events further away (O ’Sullivan &
Unwin[REDACTED PHONE] p. 87).
A typical kernel function (Gaussian) looks like a bell, is centered over each
event and is calculated, within bandwidth ( h), based on Eq. ([REDACTED PHONE] ) (Wang[REDACTED PHONE]
p. 49, Oyana & Margai[REDACTED PHONE] p. [REDACTED PHONE]; see Figure[REDACTED PHONE] ):
^fxðÞ ¼1
nhrXn
i¼1kd
h/C18/C19[REDACTED PHONE])
where
his the bandwidth (the radius of the search area)
nis the number of events inside the search radius
ris the data dimensionality
k() is the kernel function
dis the distance between the central event siand the sevent inside the
bandwidth176 Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
The kernel function calculates the probability density of an event at some
distance from a reference point (Oyana & Margai[REDACTED PHONE] p. [REDACTED PHONE]).
Various kernel types can be plugged into the preceding formula, including
normal, uniform, quadratic and Gaussian types. ArcGIS uses the quadratic
kernel function described in Silverman (Silverman[REDACTED PHONE] , Wang[REDACTED PHONE] p. 50).
The functional form is as shown in ( [REDACTED PHONE]):
^fxðÞ ¼1
nh2πXn
i¼11/C0d2
h2/C18/C19 [REDACTED PHONE])
Why Use
Kernel density estimation is used to create smooth surfaces that depict, ﬁrst,
the density of events and, second, an estimation of areas of higher or lower
event occurrence intensity. Through the raster map visualization of a kernel
density estimation, one can quickly locate hot spot and cold spot areas and
places of high or lower density (see Figure[REDACTED PHONE] ). It can also be applied for
cluster analysis of weighted or unweighted events in epidemiology, crimin-ology, demography, ethnology and urban analysis.
Figure[REDACTED PHONE] (A) Kernel density function is calculated for every single event in turn. (B) We
can imagine a kernel as a bell with the same total weight of one unit. The total weight of
all bells equals the total number of points (Longley et al. [REDACTED PHONE] p. [REDACTED PHONE]). Failing to retain
the total population would suggest that there are more or fewer events in the study area
than there really are (O ’Sullivan & Unwin[REDACTED PHONE] p. 87). The shape of the kernel depends
on the bandwidth parameter h. A large bandwidth results in a broader and lower height
kernel, while a smaller bandwidth leads to a taller kernel with a smaller base. Large
bandwidths are more suitable for revealing regional patterns, while smaller bandwidths
are more suitable for local analysis (Fotheringham et al. [REDACTED PHONE] p. 46). When all events are
replaced by their kernels and all kernels are added, then a raster density surface is
created, as seen in C. (C) Darker areas indicate higher intensity while lighter areas
indicate lower intensity. The output raster allows for a better understanding of how
events are arranged in space, as it renders every single cell, creating a smooth surface
that also re ﬂects the probability of event occurrence[REDACTED PHONE].3 Point Pattern Analysis Methods
                     

--- Page[REDACTED PHONE] ---
Interpretation
Kernel density estimates the density of the point features, rather than their
values (e.g., temperature or height; Silverman[REDACTED PHONE] ,B a i l e y&G a t r e l l[REDACTED PHONE] ).
The density estimates can be displayed by either surface maps or contour maps
that show the intensity at all locations (see Figure 3.15C ). Peaks reveal higher
concentrations and valleys lower densities.
Discussion and Practical Guidelines
The kernel function provides an interpolation technique for assessing the
impact of each event on its neighborhood (as de ﬁned by the bandwidth; Oyana
& Margai[REDACTED PHONE] p. [REDACTED PHONE]). In practice, the kernel function is applied to a limited
area around each event de ﬁned by the bandwidth hto“spread ”its effect
across space (see Figure 3.15A ). Outside the speci ﬁed bandwidth, the value of
the function is zero. For example, the quadratic kernel function falls off grad-
ually to zero with distance until the radius is reached. Each individual event ’s
impact is depicted as a 3-D surface (e.g., like a bell; see Figure 3.15B ). The ﬁnal
density is calculated by adding, for each raster cell, the intersections of the
individuals ’surfaces.
The main advantage of kernel density estimation is that the resulting density
function is continuous at all points along the scale. In addition, kernel density
estimation weighs nearby events more heavily than those lying further away
using a weighting function, thus applying an attenuating effect in space. This is
a major difference with a typical point density calculation, in which density is
calculated for each point based on the number of surrounding objects divided
by the surrounding area, with no differentiation based on proximity or any
other weighting scheme. Kernel density spreads the known quantity to theraster cell inside the speci ﬁed radius.
Figure[REDACTED PHONE] Kernel density estimation. A raster output depicts the intensity of events in the
study area[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
A weak point of this technique is the subjectivity in the de ﬁnition of the
bandwidth factor h. The selection of an appropriate value for his usually made
by trial and error. Large bandwidth hvalues create a less-detailed raster. On
the other hand, small values create a raster with a more realistic density (see
note on Figure 3.15B ; Mitchell[REDACTED PHONE] ). The selection of the appropriate band-
width depends on the problem at hand and the desired analysis scale. As
mentioned in the previous section, both Ripley ’sKand its transformed L
function provide a graph depicting the distances at which clustering or disper-
sion is more pronounced. One can ﬁrst apply the Lfunction and then select the
distance that better ﬁts the scope of the analysis which also re ﬂects the spatial
process at play. Optimized hot spot analysis may also be used to identify the
distance at which spatial autocorrelation is more evident and then be used as the
bandwidth h(see Section[REDACTED PHONE] ). Another approach is to use as the initial
bandwidth value the one that results from formula ( [REDACTED PHONE])( E S R I[REDACTED PHONE] ):
h¼0:9 min SD;ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
1
ln 2ðÞs
Dm !
n/C00:[REDACTED PHONE])
where
his the bandwidth (the radius of the search area),
nis the number of events. If a count ﬁeld is used then, nis the sum of the
counts
Dmis the median distance
SDis the standard distance
minmeans that the minimum value between SDandﬃﬃﬃﬃﬃﬃﬃﬃﬃ
1
ln 2ðÞq
Dmwill be used in[REDACTED PHONE])
Locational outliers should be removed when selecting the appropriate band-
width value, as they might lead to a very smooth raster layer. Finally, a weight
may also be given to each event (e.g., number of accidents at intersections or the
number of ﬂoors in a building). The weight is the quantity that will spread across
the study region through the kernel density estimation and the created surface.
This type of analysis might be useful for cross-comparisons with other aerial
data –for example, to study respiratory problems of people (weighted or
unweighted events) in relation to PM2.5 (particulate matter) by using an air
pollutant raster map to identify potential spatial correlations and overlapping.
3.4 Chapter Concluding Remarks
/C15 Point pattern analysis identi ﬁes if a point pattern is random, clustered or
dispersed. For analyzing the spatial distribution of the attribute values of
spatial objects, we apply spatial autocorrelation methods presented in
Chapter 4 .[REDACTED PHONE] Chapter Concluding Remarks
                     

--- Page[REDACTED PHONE] ---
/C15 Standard distance is a measure of dispersion (spread). It is a distance
expressing the compactness of a set of spatial objects.
/C15 The standard deviational ellipse is more descriptive and more widely
used than standard distance.
/C15 Spatial outliers reveal anomalies in a spatial dataset that should be further
examined.
/C15 By describing the spatial distribution of events, we attempt to identify the
spatial process that leads to this formation.
/C15 Complete spatial randomness process (also called the independent random
process, or IRP), is a process wherein spat ial objects (or their attribute values)
are scattered over the geographical space randomly.
/C15 Theﬁrst-order spatial variation effect occurs when the values or the loca-
tion of spatial objects vary from place to place due to local effect of space.
/C15 The second-order spatial variation effect occurs when there is an inter-
action among nearby locations. The location, or the value of an observa-
tion, is highly in ﬂuenced by a nearby location or the value of a
neighboring observation (the independence assumption of IRP no
longer holds).
/C15 Stationarity is the state where (a) every location in space has an equal
probability of event placement through a spatial process and (b) there is
no interaction among events (independence).
/C15 An anisotropic process occurs when intensity varies according to speci ﬁc
directions.
/C15 LandKfunctions summarize clustering or dispersion over a range of different
distances. This assists in assessing the proper scale of analysis.
/C15 The main advantage of kernel density estimation is that the resulting
density function is continuous at all points along the scale.
/C15 Kernel density estimation weights nearby events more heavily than those
lying further away using a weighting function, thus applying an attenuat-
ing effect in space.
/C15 Kernel density estimates the density of the points rather than their values.
/C15 The kernel function calculates the probability density of an event at some
distance based on an observed reference point event.
/C15 A weak point of kernel density estimation is the subjectivity in the de ﬁn-
ition of the bandwidth factor h.
Questions and Answers
The answers given here are brief. For more thorough answers, refer back to the
relevant sections of this chapter.
Q1.What are centrographic statistics? Name the most common centro-
graphic spatial statistics tools. What is their main difference from the corres-
ponding descriptive statistics?[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
A1. Centrographic statistics are tools used to analyze geographic distribu-
tions by measuring the center, dispersion and directional trend of a
spatial arrangement. The centrographic statistics in most common use
are the mean center, median center, central feature, standard distance
and standard deviational ellipse. Centrographic statistics are calculated
based on the location of each feature, which is their major difference
from descriptive statistics, which concern only the nonspatial attributes of
spatial features.
Q2.What are the main differences between locational and spatial outliers?
A2. A spatial outlier should not be confused with a locational outlier. To detect
locational outliers, we analyze only the distance of a spatial entity to its
neighbors. No other attribute analysis takes place. To detect spatial out-
liers, we study if an attribute value deviates signi ﬁcantly from the attribute
values of the neighboring entities.
Q3.What are the main characteristics of a spatial outlier?
A3. A spatial outlier does not need to be a locational outlier. Additionally, a
speci ﬁc entity may be labeled as a spatial outlier only for a single attribute,
while other attribute values might not deviate from the corresponding
attribute values of other neighboring entities. Finally, a spatial outlier is
not necessarily a global outlier as well, as the spatial outlier is always de ﬁned
inside a prede ﬁned neighborhood
Q4.What are the assumptions of complete spatial randomness?
A4. There are two basic assumptions of complete spatial randomness:
(a) There is an equal probability of event occurrence at any location in
the study region (also called ﬁrst-order stationary).
(b) The location of an event is independent of the locations of other
events (also called second-order stationary).
Q5.What are the main spatial point patterns and the associated spatial
processes they are generated from?
A5. Random: In this type of arrangement, events are randomly scattered all
over the study area. This is the result of a random process. Dispersed: The
events are located uniformly around the study area. This is the result of a
competitive process. Clustered: The events create clusters in some parts of
the study area, and the pattern has a large variation. This is the result of an
aggregating process.
Q6.Why is complete spatial randomness important for spatial statistics?
A6. If the data exhibit spatial randomness, there is no pattern or
underlying process. Further geographical analysis is thus unnecessary. If
the data do not exhibit spatial randomness, then they do not have the
same probability of occurrence in space ( ﬁrst-order effect presence) and/
or the location of an event depends on the location of other events
(second-order effect presence). In this case, spatial analysis is very
important, as space has an effect on event occurrences and their attri-
bute values[REDACTED PHONE] Questions and Answers
                     

--- Page[REDACTED PHONE] ---
Q7.What are the main methods of point pattern analysis, and what is their
calculation based on?
A7. Distance-based methods employ the distances among events and describe
second-order effects. Such methods include the nearest neighbor method,
theGand Fdistance functions, the Ripley ’sKdistance function and its
transformation, the Lfunction. Density-based methods use the intensity of
event occurrence across space. For this reason, they describe ﬁrst-order effects
better (quadrat count methods, kerne l estimation). Other methods include
proximity polygons by a combination of density and distance (Delaunay
triangulation, Gabriel graph, minimum spanning tree) and hot spot analysis
to identify if clusters of values of a speci ﬁc variable are formed in space.
Q8.What is the Ripley ’sKfunction, and why it is used? What is its main
advantage?
A8. Ripley ’sKfunction is a spatial analysis method of analyzing point pat-
terns based on a distance function. It can be used to (a) decide if the
observed point pattern is random, dispersed or clustered at a speci ﬁc
distance or a range of distances; and (b) identify the distance at which
the clustering or dispersion is more pronounced. Ripley ’sKfunction
illustrates how the spatial clustering or dispersion of features ’centroids
changes when the neighborhood size changes by summarizing the
clustering or dispersion over a range of different distances. This is very
helpful because it assists us in assessing the proper scale of analysis for
the problem at hand.
Q9.What is kernel density estimation, and how is it mapped?
A9. Kernel density estimation is a nonparametric method that uses kernel
functions to create smooth maps of density values, in which the density at
each location indicates the concentration of points within the neighbor-
ing area (high concentrations as peaks, low concentrations as valleys).
The kernel function provides an interpolation technique for assessing the
impact of each event on its neighborhood (as de ﬁned by the bandwidth).
In practice, the kernel function is applied to a limited area around each
event de ﬁned by the bandwidth hto“spread ”its effect across space.
The density estimates can be displayed by either surface maps or contour
maps that show the intensity at all locations.
Q10. When should kernel density estimation be used?
A10. Kernel density estimation is used to create smooth surfaces that depict, ﬁrst,
the density of events and, second, an estimation of areas of higher or lower
event occurrence intensity. Kernel density estimation can be used in spatial
analysis for hot spot and cold spot identi ﬁcation. It can also be applied for
clusters analysis of weighted or unweighted events in epidemiology, crim-
inology, demography, ethnology and urban analysis. Through the raster
map visualization of a kernel density estimation, one can quickly locate hot
spot and cold spot areas and places of high or lower density[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns
                     

--- Page[REDACTED PHONE] ---
LAB 3
SPATIAL STATISTICS: MEASURING GEOGRAPHIC DISTRIBUTIONS
Overall Progress
Figure[REDACTED PHONE] Lab 3 work ﬂow and overall progress[REDACTED PHONE] Overall Progress
                     

--- Page[REDACTED PHONE] ---
Scope of the Analysis: Crime Analysis
This lab deals with
/C15 Objective 2: Locating low-crime areas (see Table 1.2)
In this exercise we apply various spatial statistics (centrographics, nearest
neighbor, Ripley ’sK, kernel density estimation) to analyze the spatial patterns
of assaults (see Figure[REDACTED PHONE] ). The crime of assault is considered an act of
violence that involves intentional harm. The overall analysis identi ﬁes high-
crime areas that should be excluded and low-crime areas suitable for locating
a coffee shop. Locational outliers are also traced. In Chapter 4 , we more
thoroughly study other types of crime through spatial autocorrelation and
clustering analysis. The following exercises are carried out through ArcGIS only,
as GeoDa does not offer point pattern analysis functionalities.
Exercise 3.1 Measuring Geographic Distributions
In this exercise, we calculate the mean and median center of the distribution
of assaults and identify directional trends.
ArcGIS Tools to be used: Mean center ,Median center ,Standard
distance ,Standard deviational ellipse
ACTION: Calculate the Mean center and Median center
Navigate to the location you have stored the book dataset and
click Lab3_SpatialStatistics.mxd
Main Menu >File >Save As >My_Lab3_ SpatialStatistics.mxd
In I:\BookLabs\Lab3\OutputArcToolBox >Spatial Statistics Tools >Measuring Geographic
Distributions >Mean Center (see Figure[REDACTED PHONE] )
Input Feature Class = Assaults (see figure[REDACTED PHONE] )
Figure[REDACTED PHONE] Measuring geographic distributions toolbox[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.1 (cont. )
Output Feature Class = MC_Assaults (stored in the output folder
of Lab3 as MC_Assaults.shp
Leave all other fields blank/default
OK
Do the same for Median Center
ArcToolBox >Spatial Statistics Tools >Measuring Geographic
Distributions >Median Center
Input Feature Class = Assaults
Output Feature Class = MedC_Assaults (stored in the output
folder of Lab3 as MedC_Assaults.shp)
Leave all other fields blank/default
OK
Change the symbols of the resulted shapefiles to triangles
TOC >Click on MC_Assaults >Select Triangle 3 >Color = Green
>Size = 12 >OK
Click on MedC_Assaults >Select Square 3 >Color = Green >Size = 12 >OK
Interpreting results: The mean center and median center lie close to each
other, very close to the western downtown area (see Figure[REDACTED PHONE] ).
A quick graphical inspection shows that assaults occur mainly in the central
western, northern and southern postcodes of the city. Central-eastern
and eastern postcodes have very few assault events. Areas with a higher
Figure[REDACTED PHONE] Mean center dialog box[REDACTED PHONE] Exercise 3.1 Measuring Geographic Distributions

--- Page[REDACTED PHONE] ---
Exercise 3.1 (cont. )
concentration of assaults might be removed as location candidates for
the new coffee shop. At this point, the mean and median center do not
o f f e rm u c hv a l u a b l ei n f o r m a t i o n .
ACTION: Cal culate Standard distance and Standard deviational ellipse
ArcToolBox >Spatial Statistics Tools >Measuring Geographic
Distributions >Standard Distance
Input Feature Class = Assaults (see Figure[REDACTED PHONE] )
Output Feature Class = SD_Assaults (stored in the output folder
of Lab3 as I:\BookLabs\Lab3\Output\SD_Assaults.shp)
Leave all other fields blank/default
OK
Figure[REDACTED PHONE] Assault events with mean and median center[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.1 (cont. )
Change the color of the standard distance shapefile
TOC >Click on SD_Assaults >Outline Color Color = Green >OK
Calculate how many events lie inside one Standard deviation
distance
Main Menu >Selection >Select by location >
Target Layer = Assaults (see figure[REDACTED PHONE] )
Source layer = SD_Assaults
Spatial selection method = are within the source layer file
OK
TOC >RC Assaults (see figure[REDACTED PHONE] )>Open Attribute Table
Check in the lower right corner of the table: 77 out of129 selected (which is[REDACTED PHONE]%)
Close the table
Main Menu >Selection >Clear Selected Features
Calculate Standard Deviational Ellipse
ArcToolBox >Spatial Statistics Tools >Measuring Geographic
Distributions >Directional Distribution
Input Feature Class = Assaults (see Figure[REDACTED PHONE] )
Output Feature Class = SDE_Assaults (stored in the output folderof Lab3 as I:\BookLabs\Lab3\Output\SDE_Assaults.shp)
Figure[REDACTED PHONE] Standard distance dialog box[REDACTED PHONE] Exercise 3.1 Measuring Geographic Distributions

--- Page[REDACTED PHONE] ---
Exercise 3.1 (cont. )
Leave all other fields blank/default
OK
Calculate how many events lie inside one Standard deviational
ellipse distance
Main Menu >Selection >Select by location >
Target Layer = Assaults
Select layer = SDE_Assaults
Spatial selection method = are within the source layer file
OK
TOC >RC Assault >Open Attribute Table
Check in the lower right corner of the table: 75 out of[REDACTED PHONE] selected[REDACTED PHONE]%)
Main Menu >Selection >Clear Selected Features
Close attribute table
Main Menu >File >Save
Figure[REDACTED PHONE] Select be location tool[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.1 (cont. )
Figure[REDACTED PHONE] Standard deviational ellipse dialog box.
Figure[REDACTED PHONE] Standard distance (SD = 1,[REDACTED PHONE] m) of assaults centered over the mean center;
77 out of[REDACTED PHONE] (highlighted) lie within on es t a n d a r dd i s t a n c ef r o mt h em e a nc e n t e r .[REDACTED PHONE] Exercise 3.1 Measuring Geographic Distributions

--- Page[REDACTED PHONE] ---
Exercise 3.1 (cont. )
Interpreting results: The standard distance is 1,[REDACTED PHONE] m (attribute table of
SD_Assaults, last column; see Figure[REDACTED PHONE] ); 77 out of[REDACTED PHONE] events[REDACTED PHONE]%) lie
less than one standard distance from the mean center, a relatively small area
compared to the city, revealing the assaults concentration. Areas further away
have a lower risk of assault and are probably safer and suitable for locating the
coffee house.
We also observe that there is a directional trend in the event distribution,
which is depicted by using the standard deviational ellipse (see Figure[REDACTED PHONE] ).
The results reveal a south-to-north tendency. Additionally, a one standarddeviational ellipse covers[REDACTED PHONE]% (75 out of[REDACTED PHONE]) of the assaults. Although
Figure[REDACTED PHONE] Assault events plotted along with standard distance and standard
deviational ellipse. The ellipse reveals a south-to-north directional trend. The
dashed line splits the study area in two. The majority of assaults lie in the left-hand
side, while the right-hand side has only a few assault occurrences. The overall
pattern shows substantial heterogeneity[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.1 (cont. )
this is similar to the percentage produced by the standard distance, it is
more informative because it also provides the direction, allowing for a better
tracing of areas that might be excluded due to higher crime rates (see
Figure[REDACTED PHONE] ). Another interesting ﬁnding is that the study area seems to be
split in half. If we consider the north-to-south axis passing through the center
of the downtown area (denoted by the red outline), most of the assaults lie
on the left-hand side of the axis, while only a few lie on the right-hand side.
This reveals heterogeneity in the west-to-east direction (see Figure[REDACTED PHONE] ).
Furthermore, a graphical inspection (see Figure[REDACTED PHONE] ) shows clusters of
events on the left-hand side. The centrographics used provide a description
regarding central tendencies and directional trends but fail to spot the areas
in which crime clusters away from the center. To better describe the assault
point pattern, we apply more advanced tools in the following exercises.
Exercise 3.2 Point Pattern Analysis
In this exercise, we go beyond just describing the geographic distribution of
assaults (in Exercise 3.1). We analyze the spatial pattern to identify whether it
is clustered, random or dispersed. We also investigate if any speci ﬁc spatial
process generates the observed pattern and if this pattern is of any particu-
lar importance to our project. Finally, we determine the scale of our analysis.
ArcGIS Tools to be used: Average nearest neighbor ,Ripley ’sK
ACTION: Average Nearest Neighborhood
Navigate to the location you have stored the book dataset and
click
My_Lab3_ SpatialStatistics.mxd
ArcToolBox >Spatial Statistics Tools >Analyzing Patterns >
Average Nearest Neighbor
Input Feature Class = Assaults (see Figure[REDACTED PHONE] )
Distance Method = Euclidean Distance
Check Generate Report
Leave all other fields blank/default
OK191 Exercise 3.2 Point Pattern Analysis

--- Page[REDACTED PHONE] ---
Exercise 3.2 (cont. )
To open the report, go to Results window. If Results window is
not activated:
Main Menu >Geoprocessing >Results
Current Session >Click at the plus sign at the left of Average
Nearest Neighbor >Double Click Report File: NearestNeighbor_-
Result.html (see Figure[REDACTED PHONE] )
Report opens
The z-score ( /C04.19) is less than /C02.58, indicating a p-value smaller than[REDACTED PHONE] (far left of the graph; see Figure[REDACTED PHONE] ). There is a less than 1% likelihood
that this clustered pattern could be the result of complete spatial
randomness.
Close report and the Results window
Save
Figure[REDACTED PHONE] Average nearest neighbor dialog box.
Figure[REDACTED PHONE] Average nearest neighbor dialog box. Results are: Nearest Neighbor
Ratio = [REDACTED PHONE], z-score = /C04[REDACTED PHONE], p-value = [REDACTED PHONE][REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.2 (cont. )
Interpreting results: The nearest neighbor ratio is[REDACTED PHONE] (less than 1) and is
statistically signi ﬁcant at the[REDACTED PHONE] p-value level, indicating a process toward
clustering (see Figure[REDACTED PHONE] ). Given the z-score of /C04[REDACTED PHONE] and the p-
value of[REDACTED PHONE], there is a less than 1% likelihood (signi ﬁcance level α) that
this clustered pattern could be the result of complete spatial randomness. In
other words, we have a 99% probability (con ﬁdence level) that the distribu-
tion of assaults forms a clustered pattern (see Figure[REDACTED PHONE] ). Thus, crime is not
distributed randomly across space, and some places have a higher probabil-
ity of assaults occurrence than others. Nearest neighbor analysis does not
trace where these clusters are. These areas should be located. To this end,
we apply kernel density estimation (see Exercise 3.3 )b yd e ﬁning an appro-
priate bandwidth h. One method of doing so is by using Ripley ’sKfunction
and its Ltransformation, as shown next.
z
pz p
Figure[REDACTED PHONE] Graphical explanation of the average nearest neighbor statistic output[REDACTED PHONE] Exercise 3.2 Point Pattern Analysis

--- Page[REDACTED PHONE] ---
Exercise 3.2 (cont. )
ACTION: Ripley ’s K Function
Before we use Ripley ’s Function we create the boundary of the
study area (needed for the tool).
Main Menu >Geoprocessing >Dissolve
Input Feature Class = City (see Figure[REDACTED PHONE] )
Output Table = I:\BookLabs\Lab3\Output\Boundaries.shp
Leave all other fields blank/default
OK
TOC >Click the polygon symbol of Boundaries >Select Hollow >
Outline Width = 1 >OK
Now we proceed with K-Ripley ’s function.
ArcToolBox >Spatial Statistics Tools >Analyzing Patterns >Multi-
Distance Spatial Cluster Analysis
Input Feature Class = Assaults (see Figure[REDACTED PHONE] )
Output Table = I:\BookLabs\Lab3\Output\RipleyTableAssaults
Figure[REDACTED PHONE] Dissolve GIS operation dialog box, neede d to create the case study boundaries[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.2 (cont. )
Number of Distance Bands = 10
Compute Confidence Envelope = 99_PERMUTATIONS
Check Display Results Graphically
Boundary Correction Methods = SIMULATE_OUTER_BOUNDARY_VALUES
(This method is used to simulate points outside the study area
so that the number of neighbors close to the boundaries is not
underestimated.)
Study Area Method = USER_PROVIDED_STUDY_AREA_FEATURE_CLASS
Study Area Feature Class = Boundaries
Leave all other fields blank/default
OK
Results are
k-Function Summary
Distance* L(d) Diff Min L(d) Max L(d)
[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE].62
Figure[REDACTED PHONE] Ripley ’sKfunction dialog box[REDACTED PHONE] Exercise 3.2 Point Pattern Analysis

--- Page[REDACTED PHONE] ---
Exercise 3.2 (cont. )
[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE].12
RC on graph >Add to Layout >Return to Data View >Close graph
Results are saved on the RipleyTableAssaults table.
TOC >RC RipleyTableAssaults >Open
Main Menu >File >Save
Figure[REDACTED PHONE] Ripley ’sKfunction graph. Distances range from[REDACTED PHONE] m to 2,[REDACTED PHONE] m, with
10 intervals of around[REDACTED PHONE] m each. The observed Kvalues are indicated by the
red line[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.2 (cont. )
Interpreting results: The observed Kvalue is larger than the expected value
for all distance values within the range[REDACTED PHONE] m –2,[REDACTED PHONE] m), indicating that the
distribution of the events tends to be a clustered pattern rather than a
random one (see Figure[REDACTED PHONE] ). The observed Kvalue is larger than the upper
conﬁdence envelope ( MaxL(d) ) value for all distances, showing that the
spatial clustering for each distance is statistically signi ﬁcant. The Ripley ’sK
function indicates that there is a statistically signi ﬁcant clustering of assaults
at various distances. The larger the difference between the observed and
the expected value, the more intense the clustering at that distance ( Diff ).
As such, the spatial clustering of assaults is more evident at 1,[REDACTED PHONE] m
(where the difference is maximized; see results listed earlier). From the
analysis perspective, this means that, if we would like to analyze this point
pattern further, a good scale of analysis (distance at which spatial weights
are calculated for spatial statistics) would be 1 km. We could also use a
smaller distance, such as[REDACTED PHONE] m, which is the second largest difference. It
is not easy to de ﬁne which distance bandwidth is the most appropriate; with
the Ripley ’sKfunction, however, we obtain a measure of the distances at
which clustering seems to be more pronounced. In the next exercise, the
1 km bandwidth is applied, but you can experiment with other bandwidths,
using the preceding table as a guide and then analyzing the results in the
context of the speci ﬁc analysis. Therefore, there is no correct bandwidth
value; the choice largely depends on the problem, the scope and the
regional or local type of the analysis.
Exercise 3.3 Kernel Density Estimation
In this exercise, we use kernel density estimation to create a smooth map of
density values, in which the density at each location indicates the concen-
tration of assaults (high concentrations of assaults as peaks, low concen-
trations of assaults as valleys; see Figure[REDACTED PHONE] ). Kernel density tools re ﬂect
the probability of event occurrence, which is very useful for identifying the
areas that have a high or low risk of assault.
ArcGIS Tools to be used: Kernel density estimation ,Reclassify ,
Raster to PolygonACTION: Kernel density estimation
Navigate to the location you have stored the book dataset and click197 Exercise 3.3 Kernel Density Estimation

--- Page[REDACTED PHONE] ---
Exercise 3.3 (cont. )
My_Lab3_ SpatialStatistics.mxd
ArcToolBox >Spatial Analyst Tools >Density >Kernel Density
Input point = Assaults (see Figure[REDACTED PHONE] )
Population field = NONE
Output Raster = E:\BookLabs\Lab3\Output\KDEAssault1km
Output cell size = Let default
Search radius = [REDACTED PHONE]
Leave the other fields as default
OK
TOC >Drag the layer above the City layer (You should click
first the List By Drawing Order button) (see Figure[REDACTED PHONE] )
Main Menu >File >Save
Interpreting results: The kernel density raster output at a 1 km bandwidth is
depicted in Figure[REDACTED PHONE] (bandwidth estimated by the Ripley ’sKfunction in
Exercise 3.2). This map highlights areas at higher and lower risk of assault
occurrence. Several assault hot spots are clearly identi ﬁed. Light grey areas
(valleys) indicate low-intensity crime areas. The darker the grey color (peaks),
Figure[REDACTED PHONE] Kernel density dialog box[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.3 (cont. )
the higher the intensity of assaults and the probability of crime occurrence.
The crime pattern resembles a polycentric model. There is one large central
crime hot spot covering almost half of the downtown area (its western part)
and four smaller hot spots allocated around the main hot spot at its western,
northern and southern sides. The raster ﬁle does not cover several parts of
the map. In fact, the intensity is not estimated for the eastern postcodes of
the city because no assault events were recorded there. However, the
outcome of the kernel density applies only to its coverage area and cannot
be generalized for the uncovered area. In the context of the one-year time-
span the crime data refer to, we can state that no assaults were reported at
speci ﬁc places, and density maps cannot be produced. Nevertheless, the
current crime data clearly indicate areas of crime clusters and areas wherecrime is likely to emerge in the future. Areas of high crime intensity should
be avoided when locating the coffee shop (see Boxes 3.1 and3.2).
Figure[REDACTED PHONE] Kernel density estimation map of crimes. Crime events, mean center,
median center, standard distance and standard deviational ellipse are also overlaid[REDACTED PHONE] Exercise 3.3 Kernel Density Estimation

--- Page[REDACTED PHONE] ---
Exercise 3.3 (cont. )
Box 3.1 It is beyond the scope of this project to explain why these hot
spots exist in these speci ﬁc areas. Further analysis could investigate the
demographic and socioeconomic pro ﬁles of these areas, the urban pat-
terns formed, the living conditions and many other factors. This example
shows that spatial analysis assists in identifying, locating and quantifying
interesting spatial patterns that can be further studied from a multidisci-
plinary perspective, allowing for a better analysis of various phenomena.
Box 3.2 Analysis Criterion C2 to be used in synthesis Exercise 5.4: The
location of the coffee shop should not lie within areas of high assault
densities. [C2_AssaultsHighDensity.shp] .
We regard as high assault density areas all those with a KDE value larger
than 20. To trace these areas, we have to reclassify the raster image to group
values in integer intervals and then convert it to shape ﬁle. You may com-
plete this task when you reach Exercise 5.4.
ACTION: Convert high-density areas to shapefile (see Box 3.2 )
ArcToolBox >Spatial Analyst Tools >Reclass >Reclassify
Input Raster = KDEAssault1km (see Figure[REDACTED PHONE] )
Reclass field = VALUE
Classify >Classes = 2 >Break Values = 20 (only change first
row) >OK
Output raster >In the window that opens create a new Geodata-
base (by selecting the relevant icon) within Lab3\Output andname it Raster.gdb >DC Raster.gdb and type RecKDE (This saves
the file to I:\BookLabs\Lab3\Output\Raster.gdb\RecKDE)
OK200 Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.3 (cont. )
ACTION: Export high density crime areas to shapefile
ArcToolBox >Conversion Tools >From Raster >Raster to Polygon
Input Raster = RecKDE
Output polygon features = I:\BookLabs\Lab3\Output\AssaultsDen-
sity.shp
OK
Figure[REDACTED PHONE] Reclassify dialog box[REDACTED PHONE] Exercise 3.3 Kernel Density Estimation

--- Page[REDACTED PHONE] ---
Exercise 3.3 (cont. )
Main Menu >Selection >Select By Attributes
Layer = AssaultsDensity
SELECT * FROM AssaultsDensity Where: “GRIDCDE ”=2
OK
TOC >RC AssaultsDensity >Data >Export Data >Output feature class:
I:\BookLabs\Lab3\Output\C2_AssaultsHighDensity.shp
Main Menu >File >Save
Figure[REDACTED PHONE] High-density assaults area in light blue[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.4 Locational Outliers
In this exercise, we determine if any postcodes can be characterized as
locational outliers. The outcome of this exercise will be used in Lab 4 for
spatial autocorrelation estimation. First, we calculate the centroids of the
polygons, and then we calculate the distance of every centroid (point) to its
nearest one. ArcGIS considers outliers to be those objects that lie more than
three standard distances away from the closest one.
ArcGIS Tools to be used: Feature To Point ,Near ,Histogram .
ACTION: Calculate centroids
Navigate to the location you have stored the book dataset and
click
My_Lab3_ SpatialStatistics.mxd
ArcToolBox >Data Management Tools >Features >Feature To Point
Input Features = City (see Figure[REDACTED PHONE] )
Output Feature Class = I:\BookLabs\Lab3\Output\CityCentroids.
shp
OK
ACTION: Calculate nearest neighbor distances
ArcToolBox >Analysis Tools >Proximity >Near
Input Features = CityCentroids
Figure[REDACTED PHONE] Feature to point dialog box[REDACTED PHONE] Exercise 3.4 Locational Outliers

--- Page[REDACTED PHONE] ---
Exercise 3.4 (cont. )
Near Features = CityCentroids
Leave all other fields blank/default
OK
Nearest distance column is added to the attribute table ofCityCentroids (see Figure[REDACTED PHONE] )
Figure[REDACTED PHONE] Map of polygon centroids[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
Exercise 3.4 (cont. )
ACTION: Calculate nearest neighbor distance average and standard
deviation
TOC >RC CityCentroids >Open Attribute Table >RC on NEAR_DIST
column >Statistics
Close Statistics table
Close Table
Main Menu >File >Save
Interpreting results: The distribution of the variable “Nearest neighbor
distance ”(the distance of each point to its nearest neighbor) is slightly
skewed, with an average nearest neighbor distance of[REDACTED PHONE] m and a
standard deviation of[REDACTED PHONE] m (see Figure[REDACTED PHONE] ). We regard as locational
outliers those lying more than three standard deviations from the mean,
which is[REDACTED PHONE] +3*[REDACTED PHONE] = 1,[REDACTED PHONE] m. In this dataset, no object (post
Figure[REDACTED PHONE] Nearest neighbor ID and nearest distance for each point are added to the
attribute table of the CityCentroids layer[REDACTED PHONE] Exercise 3.4 Locational Outliers

--- Page[REDACTED PHONE] ---
Exercise 3.4 (cont. )
code) lies more than 1,[REDACTED PHONE] from its nearest neighbor, so no locational
outlier is identi ﬁed. The existence of locational outliers can be easily
detected by using the Optimized Hot Spot Analysis Tool (see the next
chapter). However, this tool does not indicate which objects are outliers;
it indicates only how many outliers exist. Sometimes, this is suf ﬁcient.
However, if we need to examine the location of these outliers, the pre-
ceding analysis is preferred.
Figure[REDACTED PHONE] Frequency distribution of the variable “Nearest neighbor distance. ”[REDACTED PHONE] Analyzing Geographic Distributions and Point Patterns

--- Page[REDACTED PHONE] ---
4 Spatial Autocorrelation
THEORY
Learning Objectives
This chapter deals with
/C15 Spatial autocorrelation and its i mportance to geographical problems
/C15 Global and local spatial autocorrelation techniques like Moran ’sI,
Getis-Ord G and Geary C
/C15 Tracing spatial clusters of high values (hot spots) or low values
(cold spots)
/C15 Tracing spatial outliers
/C15 Optimized hot spot analysis
/C15 Interpreting the statistical signi ﬁcance of results
/C15 Incremental spatial autocorrelation used to de ﬁne the appropriate scale
of analysis
/C15 The multiple comparison problem and spatial dependence
/C15 Introducing Bonferroni correction and the false discovery rate
/C15 Spatiotemporal autocorrelation analysis using bivariate and differential
Local Moran ’sIindex
/C15 Presenting step-by-step examples using ArcGIS and GeoDa
After a thorough study of the theory and lab sections, you will be able to
/C15 Distinguish between global and local spatial autocorrelation
/C15 Understand why spatial autocorrelation analysis is relevant to geograph-
ical analysis
/C15 Apply local and global indices of spatial autocorrelation like local
Moran ’s, Getis-Ord Giand G∗
i
/C15 Use Moran ’sIscatter plot to identify patterns
/C15 Identify hot spots or cold spots
/C15 Identify and locate spatial outliers
/C15 Use bivariate and differential Local Moran ’sIto identify if spatiotemporal
autocorrelation exists and if changes cluster over time[REDACTED PHONE]

--- Page[REDACTED PHONE] ---
/C15 Apply these tools using ArcGIS
/C15 Interpret the results from both the statistical signi ﬁcance and spatial
analysis standpoints
4.1 Spatial Autocorrelation
Deﬁnition
Spatial autocorrelation is the degree of spatial dependency, association or
correlation between the value of an observation of a spatial entity and the
values of neighboring observations of the same variable. The terms “spatial
association ”and “spatial dependence ”are often used to re ﬂect spatial auto-
correlation as well.
Why Use
Spatial autocorrelation is examined to determine if relationships exist among
the attribute values of nearby locations and if these values form patterns
in space.
Interpretation
According to the ﬁrst law of geography, objects in a neighborhood tend to
have more similarities and interactions than those lying further away. This is
what we call “spatial dependency. ”To measure spatial dependency, we use
spatial autocorrelation metrics. Put simply, spatial autocorrelation measures
how much the value of a variable in a speci ﬁc location is related to the values
of the same variable at neighboring locations.
The spatial autocorrelation concept is similar to that of the statistical correl-
ation used for nonspatial variables. Still, there is a major difference. While
statistical correlation refers to two distinct variables with no reference to
location, spatial autocorrelation refers to the value of a single variable at a
speci ﬁc location in relation to the values of the same variable at neighboring
locations.
In statistical correlation, if two variables tend to change in similar ways (e.g.,
higher income correlated to higher educational attainment), we have positive
correlation. Likewise, if similar values of a variable (either high or low) in a
spatial distribution tend to collocate, we also have positive spatial autocorrela-
tion. Positive spatial autocorrelation is the state where “data from locations
near one another in space are more likely to be similar than data from locations
remote from one another ”(O’Sullivan & Unwin[REDACTED PHONE] p. 34). In other words,
autocorrelation (or self-correlation) exists when an attribute variable of a spatial
dataset, correlates with itself at speci ﬁc distances, called lags. This means that
location affects the values of the variable in such a way that promotes values208 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
clustering in speci ﬁc areas. A typical example of positive spatial autocorrelation
is the income distribution within a city. Households with higher incomes gen-
erally tend to cluster in speci ﬁc regions of the city, while households with lower
incomes tend to cluster in other regions.
With negative spatial autocorrelation, on the other hand, neighboring spatial
entities tend to have different values. This is similar to negative correlation,
where high values of one variable indicate low values of the other. When there
is no spatial autocorrelation, there is a random distribution of the values in
relation to their locations, with no apparent association among them.
Discussion and Practical Guidelines
Spatial autocorrelation analysis is extremely important in geographical studies.
If spatial autocorrelation did not exist, geographical analysis would be of little
interest (O ’Sullivan & Unwin[REDACTED PHONE] p. 34). Think about it. We perform geograph-
ical analysis because we assume that location matters. If it did not, geography
would be irrelevant. In most cases, phenomena do not vary randomly across
space. For example, population concentrates on cities, income concentrates
on cities, temperature displays small ﬂuctuations inside a small area, and rain is
uniform for a relatively small area. A student searching for a seat in an auditor-
ium is most likely to sit next to a friend (who is already sitting). If you visit a
restaurant, you will sit at an empty table. All these facts reveal nonrandom
patterns. This is why geography is worth studying. If spatial arrangements
were random, the global population could be located in every single
location of the world with the same probability. If this were the case, more
people would be living in the Antarctic or at heights above 5,[REDACTED PHONE] m. If the
temperature were random, you might be able to experience 30/C14C weather
while standing outside the front door of your house and 1/C14C weather by
jumping into the backyard. If rain were random, you might get wet while
sunbathing at a beach in Santorini (Greece) during summer while the fellow
right next to you lies on the sunbed sweating underneath the sun. In an
auditorium, you will hardly see a student sit in a position that is already
occupied. In a restaurant, it is rare (though not unusual) to sit at a table with
people entirely unknown to you.
The aforementioned examples show that location matters and that a certain
state in ﬂuences what follows in a nonrandom way. They also remind us of the
ﬁrst- and second-order effects (see Section[REDACTED PHONE] ). By studying location, we
reveal trends and patterns regarding the phenomenon at hand –for example,
the spatial distribution of household income, the pattern of a disease
outbreak, the relationship between residential location and mental well-being
(Liu et al. [REDACTED PHONE] ), or the linkage between built environment and physical health
(Wang et al. [REDACTED PHONE] ). Spatial autocorrelation is quite common in geographical
analysis. This does not necessarily mean that it will occur across the entire study
area (known as global autocorrelation). Spatial autocorrelation is sometimes209 4.1 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
evident only in subregions of the study area (known as local autocorrelation). In
any case, spatial autocorrelation reveals the nonrandom distribution of the
phenomenon being studied.
The nonrandom geographical distribution of the values of the variables
under study has signi ﬁcant effects on the accuracy of classical statistics. In
conventional statistics, the observed samples are assumed to be independent.
In the presence of spatial autocorrelation, this assumption is violated. Obser-
vations are now spatially clustered or dispersed. This typically means that
classical statistical tools are no longer valid. For example, linear regression
would lead to biased estimates or exaggerated precision (Gangodagamage
et al. [REDACTED PHONE] p. 34). Bias leads to overestimate or underestimate of a population
parameter, and exaggerated precision leads to a higher likelihood of getting
statistically signi ﬁcant results when, in reality, we should have gotten less (de
Smith[REDACTED PHONE] p. [REDACTED PHONE]). In addition, spatial autocorrelation infers redundancy in the
dataset. Each newly selected sample is expected to provide less new infor-
mation, affecting the calculation of con ﬁdence intervals (O ’Sullivan & Unwin[REDACTED PHONE] ). That is why we should use spatial statistics when analyzing spatial data
and perform a spatial autocorrelation analysis before conducting any conven-
tional statistical analysis.
Several diagnostic measures can be used to identify spatial autocorrelation.
Those that estimate spatial autocorrelation by a single value for the entire
study area are named global spatial autocorrelation measures . The most
commonly used are
/C15 Moran ’sIindex
/C15 General G-Statistic
/C15 Geary ’s C index
As mentioned, it is unlikely that any spatial process will be homogeneous in the
entire area due to the nonuniformity and noncontinuity of space. The magni-
tude of spatial autocorrelation may vary from space to space due to spatial
heterogeneity. To estimate spatial autocorrelation at the local level, we use
local measures of spatial autocorrelation, like
/C15 Local Moran ’sIindex
/C15 Getis-Ord Giand G∗
istatistics
With such metrics, we describe spatial heterogeneity (in the distribution
of the values of a variable) as they identify hot or cold spots, clusters and
outliers.
Spatial autocorrelation (either positive or negative) is a key concept in geo-
graphic analysis. A test of global or local spatial autocorrelation should be
conducted prior to any other advanced statistical analysis when dealing with
spatial data. Note that correlation does not necessarily imply causation; it implies
only association. Relationships of cause and effect should always be established210 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
only after thorough analysis in order to avoid erroneous linkages (see Section[REDACTED PHONE] for more discussion on this).
4.2 Global Spatial Autocorrelation[REDACTED PHONE] Moran ’sIIndex and Scatter Plot
Deﬁnition
Moran ’sIindex computes global spatial autocorrelation by taking into account
feature locations and attributes values (of a single attribute) simultaneously
(Moran[REDACTED PHONE] ). It is calculated by the following formula ( 4.1):
I¼nPn
iPn
jwijPn
iPn
jwijxi/C0/C22x ðÞ xj/C0/C22x/C0/C1
Pn
ixi/C0/C22x ðÞ2(4.1)
where
nis the number of the spatial features
xiis the attribute value of feature i, (remember that a variable is also called
attribute in the spatial analysis context)
xjis the attribute value of feature j
/C22xis the mean of this attribute
wi,jis the spatial weight between feature iand jPn
iPn
jwijis the aggregation of all spatial weights
The tool calculates the mean /C22x, the deviation from the mean xi/C0/C22x ðÞ and the
data variancePn
i¼1xi/C0/C22xðÞ2
n(denominator). Deviations from all neighboring fea-
tures are multiplied to create cross-products (the covariance term). Then, the
covariance term is multiplied by the spatial weight. All other parameters are
used to normalize the value of the index. For example, the aggregation of spatial
weights is used to normalize for the number of adjacencies. By the same means,
the variance is used to ensure that the value index will not be large just because
of a large variability in xvalues (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Why Use
Global Moran ’sIis used as a metric for global spatial autocorrelation. It is
mostly used for aerial data, along with ratio or interval data.
Interpretation and Moran ’sIScatter Plot
Moran ’sIindex is an inferential statistic. It is interpreted based on the expected
value calculated ( Eq. 4.2 ) under the null hypothesis of no spatial autocorrelation
(complete spatial randomness) and is statistically evaluated using a p-value and a
z-score (just as any common inferential statistic –seeSection 2.5 ). The expected
value for a random pattern is ( 4.2):[REDACTED PHONE] Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
EIðÞ¼/C01
n/C01(4.2)
where ndenotes the number of spatial entities.
The expected value is the value that would have resulted if the speci ﬁc
dataset were the result of complete spatial randomness. The more spatial
objects there are, the more the expected value tends to zero. The observed
index value is the Moran ’sIindex value calculated for the speci ﬁc dataset
through Equation (4.1 ). Positive Moran ’sIindex values (observed) signi ﬁ-
cantly larger than the expected value E(I) indicate clustering and positive
spatial autocorrelation (i.e., nearby l ocations have similar values). Negative
Moran ’sIindex values (observed) signi ﬁcantly smaller than the expected
value E(I) indicate negative spatial autocorrelation, meaning that the neigh-
boring locations have dissimilar values. Values close to the expected value
indicate no autocorrelation.
The difference between the observed and expected values has to be evalu-
ated based on a z-score and a p-value. Through these metrics, we assess if this
difference is statistically signi ﬁcant.
/C15 If the p-value is large (usually p>[REDACTED PHONE]), the results are not statistically
signi ﬁcant, and we cannot reject the null hypothesis. The interpretation
in statistical jargon is that we cannot reject the null hypothesis that the
spatial distribution of the values is the result of complete spatial ran-
domness due to a lack of suf ﬁcient evidence .
/C15 A small p-value (usually p<[REDACTED PHONE]) indicates that we can reject the null
hypothesis of complete spatial randomness and accept that spatial auto-
correlation exists:
Ø In such a case, when the z-value is positive, there is positive spatial
autocorrelation and a clustering of high or low values. Nearby
locations will have similar values on the same side of the mean
(O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Ø If the z-value is negative, there is neg ative spatial autocorrelation
and a dispersed pattern of values. Nearby locations will have
dissimilar attribute values on the opposite sides of the mean
(i.e., a feature with a high value repels other features with low
values).
Let us consider an example. Imagine a spatial arrangement of 25 spatial objects
(e.g., postcodes) with attribute values of either 1 (white) or 0 (black; see
Figure 4.1 ).
/C15 In aperfectly dispersed pattern, squares are located so that each one
has neighbors of the opposite value. Spatial autocorrelation exists, as
the squares have a competitive spatial relationship. If one square is black,
the neighbor is white. Moran ’sIgets a negative value, smaller than the212 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
expected value, and there is negative spatial autocorrelation (see far left
inFigure 4.1 ).
/C15 If the squares are grouped (as in the far right in Figure 4.1 ), then cluster-
ing occurs. There is again spatial autocorrelation, but it is positive. The
spatial relationship is that similar values tend to cluster. Moran ’sIgets a
positive value, signi ﬁcantly larger than the expected one.
/C15 When the values are randomly scattered , then Moran ’sIvalue is close to
the expected values, and there is zero spatial autocorrelation (see central
section in Figure 4.1 ).
/C15 In intermediate states, there is no perfectly uniform status. However, an
indication of clustering, dispersion or randomness can be assessed
depending on the index value. Positive higher index values show a
tendency toward clustering. Lower negative index values show a ten-
dency toward dispersion. Note that we might obtain a low positive
Moran ’sIvalue while also observing local clusters (see the following
discussion). In all cases, we should evaluate the difference between the
observed and expected values through p-values and z-scores, as men-
tioned before.
Moran ’sIscatter plot is used to visualize the spatial autocorrelation statistic
(see Figure 4.2 ). It allows for a visual inspection of the spatial associations in the
neighborhood of each observation (data point). In other words, it provides a
representation used to assess how similar an attribute value at a location is to
its neighboring ones. Data points are points that have as coordinates the values
of the variable Xfor the x-axis and the spatial lag of the variable X(Lag-X ) for
they-axis. Lag-X is the weighted average values of Xin a speci ﬁed neighbor-
hood. Both XandLag-X variables are used in a standardized form. As such, the
weighted average is plotted at the center of the graph on the coordinates (0, 0).
Distances in the plot are expressed as number of standard deviations from
Figure 4.1 Global Moran ’sIand spatial autocorrelation[REDACTED PHONE].2 Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
the origin (0, 0). The produced scatter plot identi ﬁes which type of spatial
autocorrelation exists according to the place where a dot lies (the dot standing
for a spatial entity, such as a polygon). In the case of a polygon layer, a dot in
the upper-right corner (Q1) indicates a polygon that has high Xand high Lag-X
(also called “High-High ”). In other words, this polygon has a high value of Xand
is surrounded by other polygons that also have high values of X. That is why the
Lag-X (average values of these neighboring polygons) is also high. In this case,
there is positive spatial autocorrelation. If a dot lies in the lower-left corner
(Q3), the polygon has a low value of Xand is surrounded by polygons with low
values of X (i.e., Low-Low). We thus again have positive spatial autocorrelation.
A dot in the upper-left corner (Q4) indicates a polygon with low Xsurrounded
by polygons with high X(i.e., Low-High). This is negative spatial autocorrelation
and a strong indication of outlier presence. Finally, a dot in the lower-right
corner (Q2) indicates a polygon with high Xsurrounded by polygons with low X
(i.e., High-Low). There is negative spatial autocorrelation and an indication of
outlier presence again.
Dots can also be compared to a superimposed regression line. The slope of
the regression line over the data points equals the Moran ’sIindex value when
calculated using binary and row standardized weights. The closer a dot is to the
line, the closer the polygon is to the general spatial autocorrelation trend. The
Figure 4.2 Moran ’sIscatter plot. The four quadrants divide the space into four types of
spatial autocorrelation. Each dot in the scatter plot stands for one polygon in the map (in
this graph, fewer dots are depicted for clarity ’s sake). Polygons with high values
surrounded by polygons with high values are placed in the upper-right quadrant (Q1).
Not all surrounding polygons need to have high values, but the more polygons with
similar values there are, the stronger their associations[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
further away a dot lies from the line, the greater the deviation of this spatial unit
from the general trend. Points that deviate greatly from the regression line can
be regarded as outliers. Potential outliers with respect to the regression line
may function as leverage points that distort the Moran ’sIvalue. Such observa-
tions should be examined further, as in any case of an outlier ’s presence.
Moran ’sIindex values can be bounded to the range /C01.0 to +1.0 when the
weights are row standardized (see Section 1.9 ). For most real-world problems,
it is hard to ﬁnd perfectly dispersed ( /C01) or clustered (+1) patterns. An index
score higher than 0.3 is an indication of relatively strong positive autocorrela-
tion, while a score lower than /C00.3 is an indication of relatively strong negative
autocorrelation (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
If we do not row standardize the weights, Moran ’sIIndex might have values
beyond the [ /C01,1] boundaries. This typically indicates problems with the tool ’s
parameter settings. Examples of such problems are as follows:
/C15 Values of the attribute in question are skewed. Check the histogram to
see if this is the case.
/C15 Some features do not have neighbors or have relatively few. The concep-
tualization of the spatial relationships or distance band should be
checked to ﬁx this problem. In skewed distributions, each object must
have at least eight neighbors. This is not always suf ﬁcient (in skewed
distributions), however, and has to be determined by the user.
/C15 Selecting inverse distance often performs well but may produce very
small values, something that should be avoided.
/C15 Row standardization is not applied but should be (i.e,. row standardiza-
tion is suggested most of the times when data refer to polygons).
Discussion and Practical Guidelines
Moran ’sIindex is a global statistic and assesses the overall pattern of a spatial
dataset. By contrast, local spatial autocorrelation metrics focus on each spatial
object separately within a prede ﬁned neighborhood. Statistically signi ﬁcant
results for the local Moran ’sI(e.g., detecting clustering) do not imply statistic-
ally signi ﬁcant results for the Global Moran ’sI. Although clusters may exist and
be evident at the local level, these clusters may remain unnoticed when we
examine the pattern at the global level. Global statistics are more effective
when there is a consistent trend across the study area. If global statistics fail to
reveal a pattern in the spatial distribution, this does not mean that local
statistics will perform in similar ways. On the contrary, we should use them to
ﬁnd localized trends and patterns hidden at the global level.
It should also be mentioned that we use neighborhoods for Global Moran ’sI
calculation, but this does not make the statistic local. The term “global ”implies
that a single value for the index is produced for the entire pattern. The term
“local ”means that a value is produced for each spatial object separately. We
can thus map local Moran ’sI, as each spatial object has a local Moran ’sIvalue,[REDACTED PHONE] Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
but we cannot map Global Moran ’sI. To map Global Moran ’sI, we use the
Moran ’sIscatter plot.
The neighborhood is also de ﬁned in Global Moran ’sI, for two main reasons:
(a) To reduce computational cost: The more objects there are on the data,
the more time is required to compute the results.
(b) As we use a distance function (e.g., distance decay), the further away the
objects lie, the less impact they have on each other. As a result, there is
no need to calculate the index when weights are practically close to zero.
Selecting the right cutoff distance is not trivial. It is suggested to start
from a distance so that each object has at least one neighbor. In the case
of skewed data, each object has to have at least eight neighbors. Alter-
natively incremental spatial autocorrelation is a useful method of deter-
mining the appropriate cut of distance, as explained in Section 4.3 .
The following are some practical guidelines:
/C15 Results are reliable if we have at least 30 spatial objects.
/C15 Row standardization should be applied if necessary. Row standardization
is common when we have polygons.
/C15 When the p-value is low (usually p<[REDACTED PHONE]), we can reject the null hypoth-
esis of zero spatial autocorrelation:
(a) If the z-score is positive, there is positive spatial autocorrelation
(clustering tendency).
(b) If the z-score is negative, there is negative spatial autocorrelation
(dispersion tendency).
Finally, potential case studies for which Moran ’sIindex could be used include
/C15 Examining if income per capita is clustered (socioeconomic analysis)
/C15 Analyzing consumption behavior (geomarketing analysis)
/C15 Analyzing house values (economic analysis)
[REDACTED PHONE] Geary ’s C Index
Deﬁnition
Geary ’s C index is a statistical index used to compute global spatial autocorrela-
tion (Geary[REDACTED PHONE] ) and is calculated by the following formula ( 4.3):
C¼n/C01ðÞ
2Pn
iPn
jwijPn
iPn
jwijxi/C0xj/C0/C12
Pn
ixi/C0/C22x ðÞ2(4.3)
where
nis the total number of spatial objects
xiis the attribute value of feature i,xjis the attribute value of feature j
/C22xis the mean of this attribute216 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
wijis the spatial weight between feature iand j
Pn
iPn
jwijis the aggregation of all spatial weights
Why Use
To trace the presence of global spatial autocorrelation in a spatial dataset.
Interpretation
Geary ’s C index varies between 0 and 2. A value of 1 typically indicates no spatial
autocorrelation. Values signi ﬁcantly smaller than 1 indicate positive spatial auto-
correlation, while values signi ﬁcantly larger than 1 indicate negative spatial
autocorrelation (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Discussion and Practical Guidelines
Moran ’sIoffers a global indication of spatial autocorrelation, while Geary ’sCi s
more sensitive to differences in small neighborhoods (Zhou et al. [REDACTED PHONE] p. 69).
As such, when we search for global spatial autocorrelation, Moran ’sIis usually
preferred over Geary ’sC .
[REDACTED PHONE] General G-Statistic
Deﬁnition
General G-Statistic is a statistical index used to compute global spatial auto-
correlation. The General G-Statistic detects clusters of low values (cold spots)
or high values (hot spots) and is an index of spatial association (Getis & Ord[REDACTED PHONE] ,O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
GdðÞ ¼Pn
iPn
jwijdðÞxixjPn
iPn
jxixj,8j6¼i (4.4)
where
nis the total number of observations (spatial objects)
xiis the attribute value of feature i
xjis the attribute value of feature j
dis the distance that all pairs ( xi,xj) lie within
wijis the spatial weight between feature iand j
8j6¼iindicates that features i,jcannot be the same
Why Use
This index is used to distinguish if the positive spatial autocorrelation detected
is due to clustering of high values or due to clustering of low values. When
clusters of low values coexist with clusters of high values in the same study
area, they tend to counterbalance each other. Moran ’sIis more suitable to
trace this association on the global scale[REDACTED PHONE].2 Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Interpretation
The General G-Statistic is inferential, and its results are interpreted based on a
rejection (or not) of the null hypothesis that there is complete spatial random-
ness and, thus, clusters do not exist. A z-score and a p-value are calculated
along with the expected index value. The expected value is the value that
would result if the spatial distribution of the values (of the variable being
studied) were the outcome of complete spatial randomness. The difference
between the observed and expected values is evaluated based on a z-score
and a p-value that test if this difference is statistically signi ﬁcant.
/C15 When the p-value is small (usually p<[REDACTED PHONE]), the null hypothesis is rejected,
and there is statistically signi ﬁcant evidence for clustering:
(a) If the z-value is positive, the observed General G-Statistic value is
larger than expected, indicating a concentration of high values
(hot spots).
(b) If the z-value is negative, the observed General G-Statistic value is
smaller than expected, indicating that low values (cold spots) are
clustered in parts of the study area. In both cases, this is an indica-
tion of positive autocorrelation.
In cases where the weights are binary or less than 1, the index value is bounded
between 0 and 1. This happens because the denominator includes all ( xi,xj)
pairs, regardless of their vicinity. The numerator will be always less than or
equal to the denominator. The ﬁnal outcome regarding spatial association
should be concluded only after an examination of the p-value and the z-score.
Discussion and Practical Guidelines
The General G-Statistic measures the overall (hence “general ”)c l u s t e r i n go fa l l
pairs ( xi,xj) within a distance dof each other (Getis & Ord[REDACTED PHONE] ). Moran ’sI
index cannot distinguish between high- and low-value clustering, but the Gen-
eral G-Statistic can. On the other hand, the General G-Statistic is appropriate
only when there is positive spatial autocorrelation, as it detects the presence of
clusters. If the General G-Statistic does not produce statistically signi ﬁcant
results, we cannot reject the null hypothesis of complete spatial randomness.
Negative spatial autocorrelation might exist through a competitive process (i.e.,
where high values and low values for the same variable are nearby). As a result,
the General G-Statistic is more an index of spatial association or an index of
positive spatial autocorrelation, rather than a pure spatial autocorrelation index.
Here are some practical guidelines to follow:
/C15 General G-Statistic works only with positive values.
/C15 A binary weights matrix is more appropriate for this statistic. It is thus
recommended to use ﬁxed distance band, polygon contiguity, k-nearest
neighbors or Delaunay triangulation that produce binary weighting
schemes. For example, if we set a ﬁxed distance of 2 km, each object218 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
inside this distance will be a neighbor and have a weight of 1. All other
objects lying further away than 2 km will not be neighbors and have zero
weight. In this case, row standardization is not necessary, as the weights
already lie in a 0-to-1 range.
/C15 When we use binary weighting and ﬁxed distance, the size of the poly-
gons might matter. For example, if large polygons tend to have lower
values for the attribute being studied (e.g., population density) than the
small polygons, we might obtain higher observed values of the General
G-Statistic because more smaller polygons are creating pairs at the same
distance set. We would thus obtain higher z-values and stronger cluster-
ing results than what the real situation would justify.
/C15 It is more common to use the local version of the General G-Statistic
index, as it provides the exact locations of the clusters. There are two
versions of the Local G-Statistic: the Giand the G∗
i.
Finally, potential case studies include
/C15 Analyzing PM2.5 in an urban environment (environmental analysis)
/C15 Analyzing educational patterns (demographic analysis)
/C15 Analyzing house rents (economic analysis)
4.3 Incremental Spatial Autocorrelation
Deﬁnition
Incremental spatial autocorrelation is a method based on Global Moran ’sI
index to test for the presence of spatial autocorrelation at a range of band
distances (ESRI[REDACTED PHONE] ).
Why Use
It is used to approximate the appropriate scale of analysis (appropriate analysis
distance). Instead of arbitrary selecting distance bands, this method identi ﬁes
an appropriate ﬁxed distance band for which spatial autocorrelation is more
pronounced. In other words, it allows us to identify the farthest distance at
which an object still has a signi ﬁcant impact on another one. After the appro-
priate scale of analysis is established, local spatial autocorrelation indices and
other spatial statistics can be calculated more accurately.
Interpretation
Incremental spatial autocorrelation is used to calculate Global Moran ’sIfor a
series of incremental distances (see Figure 4.3 ). For each distance increment,
the method produces Global Moran ’sI, Expected I, variance, a z-score and a p-
value. With this method, we can plot a graph of z-scores over an increasing
distance[REDACTED PHONE].3 Incremental Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
z-score peaks re ﬂect distances at which a clustering process seems to be
occurring. The higher the z-score, the stronger the clustering process at that
distance. By locating the peak in the graph, the z-score and the corresponding
distance, we can better de ﬁne the distance band, which can be used in many
spatial statistics such as hot spot analysis (see Section[REDACTED PHONE] ). The distance in the
ﬁrst peak of the z-score graph is often selected as the appropriate scale for
further analysis (but this is not always the case; see the following discussion).
Discussion and Practical Guidelines
Selecting the appropriate scale of analysis for the problem at hand is one of the
most challenging tasks in spatial analysis. The scale of analysis de ﬁnes the size
and shape of the neighborhoods for which spatial statistics are calculated and
is closely related to the problem in question (see Section 1.3 ). Researchers and
analysts lacking an in-depth understanding of spatial statistics often tend to
apply spatial statistics tools based on the prede ﬁned default values (i.e., dis-
tance band) of the software being used. The uncritical selection of spatial
parameters (e.g., the distance of analysis) leads to incorrect estimates and
conclusions. The dif ﬁculty is not running a software tool but setting it up
properly and interpreting the results based on statistical theory. The graph
produced by incremental spatial autocorrelation allows for a statistically sound
estimation of the scale of the analysis, which is superior to an arbitrary or
intuitive selection.
It is quite common that more than one peak may occur. Different distance
bands (peaks) might reveal underlying processes on different scales of analysis.
Hypothetically, unemployment clustering statistically signi ﬁcant at[REDACTED PHONE] m and
1,[REDACTED PHONE] m peaks re ﬂect patterns of clustering at both the census block level and
the postcode level. If we are interested only in the census block level, we could
apply the[REDACTED PHONE] m distance in our analysis. Greater distances re ﬂect broader,
regional trends (e.g., east to west), while smaller distances re ﬂect local trends
z
Figure 4.3 z-scores over incremental distance[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
(e.g., between neighborhoods). For example, the scale of analysis is usually
small when the analysis is of children going to school (usually close to their
homes), while the scale is larger when analyzing commuting patterns.
There are some practical guidelines regarding this method. An initial distance
(the value from which distances will start incrementing) should be set. The initial
default distance value should ensure that each object has at least one neighbor. In
this case, if locational outliers (objects lying far away from others) exist, the initial
distance calculated may be large enough. A large initial distance value results in a
graph with no peaks, simply because a peak lies before the beginning distance. In
addition, an increment distance should also be set. We can use either a distance
increment that ﬁts the needs of the study (i.e., [REDACTED PHONE] m for a local study) or try the
average distance to each feature ’s nearest neighbor (usually the software ’s
default value). Locational outliers may distort this value, leading to very large
increments that might not be representative. For this reason, before running
incremental spatial autocorrelation, we should check if locational outliers exist.
To de ﬁne the appropriate scale of analysis, the following procedure can be
applied:
Step 1: Check for locational outliers (see Section[REDACTED PHONE] andExercise 3.4 ). If no
locational outliers exist, perform inc remental spatial autocorrelation as
described earlier. If locational outliers exist, go to step 2.
Step 2: Select all features except outliers and perform Incremental Spatial
Autocorrelation (only for the selected features) .
Step 3: Locate a peak and keep the relevant distance
Step 4: Create a Spatial Weights Matrix for the entire dataset (including the
locational outliers) with the distance de ﬁned in step 3 (see Section 1.9 ).
Set the Number of Neighbors to a value so that each object has at least
this number of neighbors.
Step 5: Run spatial statistic (e.g., Local Moran ’sIindex) using the Spatial
Weights Matrix created in step 4.
When locational outliers are removed, the z-scores graph might change signi ﬁ-
cantly, yielding a completely different scale of analysis. Through this proced-
ure, Spatial Weights are calculated based on the distance threshold that results
when outliers are removed. For objects that have no neighbors at this distance,
the Number of Neighbors parameter will be used instead. This practically
means that outliers will be treated differently but will be included in the study
so that they do not negatively impact the rest of the objects. As such, local
spatial autocorrelation indices will be calculated based on the ﬁnal weights
matrix for all objects in the dataset.
Other practical guidelines include the following (ESRI[REDACTED PHONE] ):
/C15 There might be more than one peak. Each one re ﬂects spatial autocorre-
lation at different distances. We will typically select the one with the
highest z-score, which is usually the ﬁrst. We may also select a peak221 4.3 Incremental Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
(distance) that better re ﬂects the regional or local perspective for the
problem at hand.
/C15 Select as the beginning distance the one that will ensure that each object
has at least one neighbor.
/C15 Look for locational outliers before setting the initial distance. A locational
outlier in ﬂates distance metrics, with negative impacts on the graph.
Remove outliers, and run the incremental spatial autocorrelation tool.
/C15 If there are no peaks, we can use smaller or larger distance increments. If
we still cannot locate any peaks, we should avoid the incremental spatial
autocorrelation method. We should rely on other criteria or our common
sense based on previous knowledge to de ﬁne the appropriate distance
band. We can also use optimized hot spot analysis, which enables dis-
tance band de ﬁnition even if no peaks exist (see Section[REDACTED PHONE] ).
/C15 Theﬁnal distance selected should provide an adequate number of neigh-
bors for each feature and an appropriate scale of analysis.
4.4 Local Spatial Autocorrelation
Global indices of spatial autocorrelati on identify whether there is clustering in a
variable ’s values, but they do not indicate where clusters are located. To deter-
mine the location and magnitude of spati al autocorrelation, we have to use local
indices instead. Local Moran ’sIand local Getis-Ord G∗
iare the most widely used
local indices of spatial autocorrelation.
[REDACTED PHONE] Local Moran ’sI(Cluster and Outlier Analysis)
Deﬁnition
Local Moran ’sIis an inferential spatial statistic used to calculate local spatial
autocorrelation. For nspatial objects in a neighborhood (Anselin[REDACTED PHONE] ), the
local Moran ’sIof the iobject is given as ( 4.5):
Ii¼xi/C0/C22X
m2X
jwijxj/C0/C22X/C0/C1
(4.5)
m2¼P
ixj/C0/C22X/C0/C12
n(4.6)
where
nis the total number of observations (spatial objects).
xiis the attribute value of feature i.
xjis the attribute value of feature j.
/C22Xis the mean of this attribute[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
wi,jis the spatial weight between feature iand j.
m2is a constant for all locations. It is a consistent but not unbiased estimate
of the variance. (Anselin[REDACTED PHONE] p. 99).
Tip: Keep in mind that m2is actually a scalar that does not affect the signi ﬁ-
cance of the metric as it is the same for all locations (Anselin[REDACTED PHONE] ). In some
cases, the formula of this scalar may slightly change. For example, ArcGIS and
GeoDa uses n/C01 in the denominator instead of n(de Smith et al. [REDACTED PHONE] ).
Why Use
Local Moran ’sIcan be used for an attribute to (a) identify if a clustering of high
or low values exists and (b) to trace spatial outliers (Grekousis & Gialis[REDACTED PHONE] ).
Interpretation
Local Moran ’sIindex is interpreted based on the expected value, a pseudo p-
value,and a z-score under the null hypothesis of no spatial autocorrelation
(complete spatial randomness). The expected value for a random pattern is
(Anselin[REDACTED PHONE] p. 99):
EI iðÞ ¼/C01P
jwij
n/C01(4.7)
where
ndenotes the number of spatial entities
wijis the spatial weight between feature iand j
The expected value is the value that would have resulted if the speci ﬁc attri-
bute ’s values geographical distribution were the result of complete spatial
randomness. The observed index value is the local Moran ’sIindex value given
byEquation (4.5 ).
Positive local Moran ’sIindex values (observed) signi ﬁcantly larger than
the expected value indicate potential clustering and positive spatial auto-
correlation. Negative local Moran ’sIindex values (observed) signi ﬁcantly
smaller than the expected value indicate the potential presence of spatial
outliers and negative spatial autocorrelation. Values close to the expected
value indicate no autocorrelation. To ﬁnalize a conclusion regarding spatial
autocorrelation ’s presence, we should evaluate the previously mentioned
difference (expected vs. observed) based on the z-score and the p-value.
Using these two metrics, we assess if this difference is statistically signi ﬁcant.
/C15 If the p-value is large (usually p>[REDACTED PHONE]), the results are not statistically
signi ﬁcant (even if the difference is large), and we cannot reject the null
hypothesis (see Table 4.1 ). The interpretation is that, due to a lack of
sufﬁcient evidence, we cannot reject the null hypothesis that the spatial
distribution of the values is the result of complete spatial randomness .[REDACTED PHONE] Local Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
/C15 A small p-value (usually p<[REDACTED PHONE]) indicates that we can reject the null
hypothesis of complete spatial randomness and accept that spatial auto-
correlation exists. In this case:
(a) If the z-value is positive, we have positive spatial autocorrelation
and clustering.
b) iI the z-value is negative, we have negative spatial autocorrelation,
and spatial outliers may exist especially for low z-score values.
A high positive z-score (e.g., greater than[REDACTED PHONE]) for a spatial entity means that
the neighboring spatial entities have similar values. If the values are high, then
High-High clusters are formed, meaning that spatial entities with high values
(for a speci ﬁc variable) are surrounded by spatial entities of high values (of the
same variable). If the values are low, then “Low-Low clusters ”are formed,
meaning that spatial entities with low values are surrounded by spatial entities
of low values. Note that the spatial clusters formed in High-High or Low-Low
arrangements depict only the core of a real cluster. This happens because the
statistical value for each location is calculated based on the neighboring values.
Thus, the locations (e.g., polygons) at the periphery of a cluster might not be
assigned to a High-High or Low-Low cluster.
A low negative z-score, (e.g., less than /C02.60) for a spatial entity indicates
dissimilar nearby attribute values and potential spatial outliers. If the spatial entity
has a low attribute value, then it is surrounded by features with high values, creating
a Low-High arrangement. If a spatial entity has a high attribute value, then it is
surrounded by features with low values, creating a High-Low arrangement.
Discussion and Practical Guidelines
Even with complete spatial randomness, clustering or outliers might exist due
to randomness. To overcome this problem, we use a Monte Carlo random
permutation procedure. Permutations are used to estimate the likelihood of
generating, via complete spatial randomness, a spatial arrangement of values
similar to the observed one. Using Monte Carlo, we generate multiple random
patterns and then compare the results to those of the local Moran ’sIof theTable 4.1 Interpretation of Moran ’sIp-values and z-scores. z-score values are indicative and can be
differentiated based on data.
p-value z-score Interpret
>α(e.g., [REDACTED PHONE]) Can not reject the null hypothesis of complete spatial
randomness.
<α(e.g., [REDACTED PHONE]) z<0 Negative spatial autocorrelation. Low negative values (e.g.,
z</C02.60) is an indication of spatial outlier presence.
z>0 Positive spatial autocorrelation: clustered pattern. Large
positive values (e.g., z>[REDACTED PHONE]) indicate intense clustering of
either low (cold spots) or high (hot spots) values[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
original dataset. By inspecting the reference distribution, we assess how unusual
the observed value would be in relation to this randomized benchmark (see
Figure 4.4 ).
For each permutation (i.e., of a total of[REDACTED PHONE]), the values (of the attribute
variable) are randomly rearranged around each feature, and the local Moran ’sI
index is calculated. A reference distribution of the local Moran ’sIindex values
is then created (see Figure 4.4 ). The reference distribution should be centered
at around zero, as it is supposed to be the result of complete spatial random-
ness with no spatial autocorrelation. The range of local Moran ’sIindex values,
which vary due to randomness, is depicted in the x-axis. If the local Moran ’sI
observed value lies far away from the reference distribution and in relation to
thez-score obtained (which quanti ﬁes the distance from the mean), we can
reject the null hypothesis of complete spatial randomness and accept that the
spatial autocorrelation observed is statistically signi ﬁcant.
A pseudo p-value ( p¼Rþ1
Mþ1) is calculated as the proportion of how many
times ( R) the computed local Moran ’sIvalues generated by the permutations
are equal to or larger than the observed local Moran ’sIto the number of
permutations ( M; Anselin[REDACTED PHONE] ). Typically, for[REDACTED PHONE] permutations, the pseudo p-
value is set to[REDACTED PHONE]; for 99 permutations, it is set to[REDACTED PHONE]. A pseudo p-value cannot
I
z
Ip
Figure 4.4 Permutation reference distribution for[REDACTED PHONE] random permutations under
complete spatial randomness. The results of this example suggest that the observed
value I= [REDACTED PHONE] is highly signi ﬁcant and not a result of spatial randomness, as it lies far
away from the rest of the values (with a z-score of[REDACTED PHONE]) and the expected (theoretical)
Moran ’sIvalue E(I)=/C00[REDACTED PHONE]. The pseudo p-value is[REDACTED PHONE], indicating that none of the[REDACTED PHONE] random patterns ’local Moran ’sIvalues surpassed the observed value[REDACTED PHONE].4 Local Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
be interpreted as a typical p-value, as it is the summary of the reference distribu-
tion (Anselin[REDACTED PHONE] ). A pseudo p-value of[REDACTED PHONE] (i.e., p¼0þ1
99þ1¼0:01), for example,
means that none (zero) of the 99 random patterns yielded a local Moran ’sIvalue
equal to or more extreme than the observed data. In other words, no pattern
exhibited clustering (or dispersion) equal to or larger than the observed one.
Practical guidelines include the following (ESRI 2018a ):
/C15 Results are reliable for at least 30 spatial objects.
/C15 We cannot perform this test for points events (e.g., points as crime
incidents, without any attribute ﬁelds attached). Nevertheless, we can
aggregate data into polygons and then continue with the analysis in the
usual fashion (see discussion in Section[REDACTED PHONE] ).
/C15 Each feature has to have at least one neighbor.
/C15 No feature has to have all features as neighbors.
/C15 When values are skewed, each feature should have around eight neigh-
bors or more.
/C15 The conceptualization of spatial relationships, distance bands and dis-
tance functions used should be done carefully.
/C15 The false discovery rate (see Section 4.6 ) can be used to account for
multiple comparison problems and spatial dependence.
Potential Case Studies Include
/C15 Analyzing unemployment distribution
/C15 Analyzing income inequalities
/C15 Analyzing house values[REDACTED PHONE] Optimized Outlier Analysis
Deﬁnition
Optimized outlier analysis is a procedure used to optimally select the
parameters of the local Moran ’sIindex (ESRI 2018a ). Similar to local Moran ’s
I, it locates clusters of either high or low values and traces spatial outliers.
Why Use
Optimized outlier analysis is used to overcome the dif ﬁculties of setting the
parameters of the local Moran ’sIindex. Optimized outlier analysis performs an
automated preliminary analysis of the data to ensure optimal results. The
method is used to
(a) Identify how many locational outliers exist (if any).
(b) Estimate the distance band at which the spatial autocorrelation is more
pronounced (scale of analysis) through incremental spatial autocorrelation.
(c) Adjust for spatial dependence and multiple testing through the false
discovery rate correction method (see Section 4.6 ).[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
(d) Handle point events with no variables attached. These events are auto-
matically aggregated into weighted features within some regions (i.e.,
grid; see discussion in Section[REDACTED PHONE] ). The weighted variable is then
analyzed in the usual fashion.
Interpretation
Optimized outlier analysis applies the local Moran ’sIindex, and the results can
be interpreted accordingly (see Section[REDACTED PHONE] ).
Discussion and Practical Guidelines
Optimized outlier analysis applies incremental spatial autocorrelation to de ﬁne
the scale of analysis (see Section 4.3 ). The distance band selected is the one in
which a peak occurs in the related graph. If multiple peaks are found, the
distance related to the ﬁrst peak is usually selected. If no peaks occur, the
optimized outlier analysis applies a different procedure. The spatial distribution
of the features is analyzed by calculating the average distance so that each
feature has Kneighbors. Kis de ﬁned as 5% of the total number of ( n) features
(K= [REDACTED PHONE] /C2n).Kis adjusted so that it ranges between 3 and 30. If the average
distance that ensures Kneighbors for each feature is larger than one standard
distance, the distance band is set to one standard distance (ESRI 2018a ). If it is
not, the Kneighbor average distance re ﬂects the appropriate scale of analysis.
Finally, optimized outlier analysis is effective even for data samples. It is also
effective in case of oversampling, as the associated tools have more data with
which to compute accurate results.
[REDACTED PHONE] Getis-Ord Giand Gi*(Hot Spot Analysis)
Deﬁnition
Getis-Ord Giindex and G∗
iindex (pronounced G-i-star) comprise a family of
statistics that identify statistically signi ﬁcant clusters of high values (hot spots)
and clusters of low values (cold spots) and are used as measures of spatial
association (Getis & Ord[REDACTED PHONE] , Ord & Getic[REDACTED PHONE] ,O’Sullivan & Unwin[REDACTED PHONE]
p. [REDACTED PHONE]). The process is also named hot spot analysis. The Giindex is given as
(4.8):
GidðÞ ¼P
jwijdðÞxjPn
j¼1xj,j6¼i (4.8)
where
dis the estimated range of observed spatial autocorrelationP
jwij(d) is the sum of weights for j6¼iwithin distance d
nis the total number of observations
xjis the attribute value of feature j227 4.4 Local Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Getis-Ord G∗
iindex is given as ( 4.9):
G∗
idðÞ ¼P
jwijdðÞxjPn
j¼1xj(4.9)
Note that in contrast to Gi,i nG∗
ithe restriction 8j6¼iis lifted. In other words,
the index takes into account the attribute value xiin location i.
Why Use
Hot spot analysis identi ﬁes if low or high values of a variable are spatially
clustered and create cold spots or hot spots respectively.
Interpretation
For each polygon, a z-score value is calculated along with a p-value to assess
the statistical signi ﬁcance of the results. The null hypothesis is that There is
complete spatial randomness of the values associated with the features . Having
a high positive z-score and a small p-value is an indication of spatial clustering
of high values (i.e., a hot spot), whereas having a low negative z-score with a
small p-value reveals the presence of cold spots (spatial clustering of low
values). In both cases, there is positive spatial autocorrelation. The higher the
z-score (either positive or negative), the more intense the clustering at hand. z-
scores values near to zero typically indicate no spatial clustering. When p-
values are larger than[REDACTED PHONE] (or larger than another established signi ﬁcance
level), the null hypothesis cannot be rejected, and the results are not statistic-
ally signi ﬁcant. Nonsigni ﬁcant results mean that there is no indication of clus-
tering, as the process at hand might be random. The results can be rendered in
a map with three con ﬁdence level classes (99%, 95% or 90%) for hot spot
polygons, three classes (99%, 95% or 90%) for cold-spot polygons and another
class for rendering polygons with nonsigni ﬁcant results.
Discussion and Practical Guidelines
G∗
iis used more widely than Gi. The use of G∗
iis also linked to hot spot
analysis. Hot spot analysis is mainly used to identify if clusters of values of a
speci ﬁc variable are formed in space (Grekousis[REDACTED PHONE] ). It is most commonly
used with polygon features. For point features, though, it is more helpful to
study the intensity of the objects rather than a speci ﬁc attribute. In this respect,
we use hot spot analysis to identify if hot spots or cold spots of events ’intensity
exist. Such analysis should begin by aggregating points into some regions (e.g.,
postcodes, census tracts). This can be easily done by overlaying the relevant
(administrative) polygon layers and applying typical GIS techniques (such as
spatial join to count how many points lie within each polygon). Alternatively,
we can create a grid (in the absence of a polygon layer, or to avoid at some
extent the modi ﬁable areal unit problem –seeChapter 1 ) by using a ﬁshnet
tool and then perform spatial join. The grid size should be set in such a way that228 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
most of the grids have more than one incident. The new attribute ﬁeld created,
containing the number of points per polygon/grid, can be used as the attribute
ﬁeld to be analyzed, similarly to any other polygon layer.
Some practical guidelines for the G∗
i(ESRI 2018a ):
/C15 Results are reliable if we have at least 30 objects.
/C15 Fixed distance is recommended for the G∗
iindex. The appropriate dis-
tance should be determined by using incremental spatial autocorrelation
or optimized hot spot analysis (see Section[REDACTED PHONE] ). In case of locational
outliers, a ﬁxed distance band can be combined with a minimum number
of neighbors per spatial feature. In this case, when the ﬁxed distance
band leaves some polygons with no neighbors, the minimum number of
neighbors ensures that all polygons will have at least a speci ﬁc number of
neighbors.
/C15 The false discovery rate (see Section 4.6 ) can also be used here, as in the
case of Local Moran ’sI.
/C15 This index cannot be applied for point objects (e.g., crime incidents),
without any attribute ﬁelds attached. However, we can aggregate data
by using spatial join to polygons and then continue with the analysis.
/C15 The spatial relationships, distance bands and distance functions used
should be conceptualized carefully.
Potential case studies include
/C15 Human geography/demographics: Are there any areas where the
unemployment rate form spatial clusters?
/C15 Economic geography: How is income spatially distributed? Are there cold
or hot spots?
/C15 Health analysis: Are there any unusual patterns to heart attacks?
/C15 Voting pattern analysis: Do people in favor of a speci ﬁc party cluster
together?
[REDACTED PHONE] Optimized Hot Spot Analysis
Deﬁnition
Optimized hot spot analysis is a procedure used to optimally select the
parameters of the Getis-Ord G∗
iindex (ESRI 2018a ). Similar to Getis-Ord G∗
i,
it locates spatial clusters of low values (cold spots) and spatial clusters of high
values (hot spots).
Why Use
Optimized hot spot analysis is used to overcome the dif ﬁculties of setting the
parameters of the Getis-Ord G∗
iindex. Optimized hot spot analysis performs
hot spot analysis using the Getis-Ord G∗
iindex in an automated way to ensure
optimal results. The method is used to229 4.4 Local Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
(a) Identify how many locational outliers exist (if any).
(b) Estimate the distance band at which the spatial autocorrelation is more
pronounced (scale of analysis).
(c) Adjust for spatial dependence and multiple testing through the false
discovery rate correction method (see Section 4.6 ).
(d) Handle point events with no variables attached. These events are auto-
matically aggregated into weighted features within some regions (i.e.,
grid; see discussion in Section[REDACTED PHONE] ). The weighted variable is then
analyzed in the usual fashion.
Interpretation
Optimized hot spot analysis applies the Getis-Ord G∗
iindex and results can be
interpreted as the Getis-Ord G∗
iresults are interpreted (see Section[REDACTED PHONE] ).
Discussion and Practical Guidelines
Optimized hot spot analysis applies incremental spatial autocorrelation to
deﬁne the scale of analysis (see Section 4.3 ). This is done in the same way as
that described for optimized outlier analysis (see Section[REDACTED PHONE] ).
As mentioned, for point features with no other attributes attached, optimized
hot spot analysis aggregates the points into zones and identi ﬁes potential event
concentration (clustering) or dispersion across space. It can thus be regarded as
an alternative approach to point pattern analysis that indicates if spatial auto-
correlation among point events is evident (see Section[REDACTED PHONE] ). Finally, the scale of
analysis resulting from the optimized hot spot analysis can be applied in kernel
density estimation as an alternative way to select the bandwidth h(see Section[REDACTED PHONE]).
4.5 Space –Time Correlation Analysis[REDACTED PHONE] Bivariate Moran ’sIfor Space –Time Correlation
Deﬁnition
Bivariate Moran ’sImeasures the degree to which a variable in a speci ﬁc
location is correlated with the spatial lag (average value at nearby locations) of
a different variable (Anselin[REDACTED PHONE] ). A special case of Bivariate Moran ’sI(and a
more useful one) occurs when a single variable (instead of two different vari-
ables) is used for two different time stamps. It measures the degree of the
spatiotemporal correlation of a single variable.
Why Use
Bivariate Moran ’sIindex is used to assess how the linear association (positive
or negative) of two distinct variables varies in space. The extension of Bivariate230 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Moran ’sIfor space –time correlation is used to trace if spatiotemporal auto-
correlation exists for the same variable.
Interpretation
In the case of row standardized weights, Bivariate Moran ’sIfor space –time
correlation would lie between /C01 and 1, with values close to zero indicating no
correlation, values close to 1 strong positive correlation and values close to /C01
strongly negative correlation. The signi ﬁcance of the statistic is determined
through a permutations approach.
Discussion and Practical Guidelines
Before discussing the bivariate extension to time (single variable), let us brie ﬂy
describe how the standard Bivariate Moran ’sIindex works through the Bivari-
ate Moran scatter plot. As explained before, Bivariate Local Moran ’sIanalyzes
two distinct variables. For instance, suppose we study the potential association
between income at a speci ﬁc location and land prices in surrounding areas.
A Bivariate Moran ’sIscatter plot would relate the income values for each
location (Income, horizontal axis) to the average land value at nearby locations
(W_land, vertical axis; see Figure 4.5A ). However, bivariate spatial correlation
does not take into account the inherent correlation between the two variables
(i.e., income and land price at the same location; Anselin[REDACTED PHONE] ). Leaving
unaccounted this correlation makes Bivariate Moran ’sIhard to interpret; this
often leads to incorrect conclusions, as the statistic may overestimate the
spatial effect of the correlation, which might be merely the result of the same
location correlation (Anselin[REDACTED PHONE] ). Thus, Bivariate Moran ’sIis more useful
when time is included.
More analytically, a particular case of Bivariate Moran ’sIspatial correlation
occurs when the correlation is calculated for variable Xin a location with Lag-X
within a time interval. Lag-X is the average value of Xin a nearby location but in
a previous time stamp (see Figure 4.5B ). This is the Bivariate Moran ’sIfor
space –time correlation. Conceptually, this approach explains how neighboring
values in a previous period affect the present value (Anselin[REDACTED PHONE] ). To put it
slightly differently, this approach would explain how the value at a location in a
subsequent time is affected by the average values at nearby locations in a
previous time. It can be regarded as the inward diffusion from the neighbors at
a speci ﬁc point in time to the core in the future. Switching the selection of
variable settings in the scatter plot axes produces a scatter plot with X(t/C01) in
thex-axis and the lag X(t) in the y-axis. This approach studies how a location in a
previous time affects the values of nearby locations in the future. It can be seen
as an outward diffusion originating from the core at a speci ﬁc time to the
neighbors in the future (Anselin[REDACTED PHONE] ). Although this approach is formally
correct, the results might be misleading (Anselin[REDACTED PHONE] ), mainly because the
notion of spatial autocorrelation refers to how neighbors affect the value of acentral location and not the contrary (Anselin[REDACTED PHONE] ). These approaches are231 4.5 Space –Time Correlation Analysis

--- Page[REDACTED PHONE] ---
slightly different, and the best one to use depends on the problem at hand and
the underlying process being studied.
[REDACTED PHONE] Differential Moran ’sI
Deﬁnition
For two time stamps, Differential Moran ’sItests whether a variable ’s change
at a speci ﬁc location is related to the change of the same variable in neighbor-
ing locations.
Why Use
Differential Global or Local Moran ’sIis used to identify if changes over time are
spatially clustered.
Interpretation
As with the interpretation of Moran ’sI, if a high change in a variable ’s value
between two time stamps for a speci ﬁc location is accompanied by a high
change of the same variable in the surrounding area, there is positive
spatial autocorrelation of the High-High type (i.e., hot spots) (see Figure 4.6).
In other words, the change of a variable ’s value in a speci ﬁc location follows a
I I
Figure 4.5 (A) Bivariate Moran ’sIfor income and land price. The graph indicates that
income ( x-variable) is correlated with the weighted average value of land ( y-W_Land)
within its neighborhood. All variables are expressed in standardized forms, with zero
mean and a variance of one. Spatial weights are also row standardized (Anselin[REDACTED PHONE] ).
(B) Spatiotemporal Bivariate Moran ’sI. Moran ’sIcalculates the correlation of variable X
in a location with Lag-X with the previous time stamp. The proper interpretation is that a
value at a location for income ( x-variable) is correlated with the weighted income value
of its neighbors in a previous time (Anselin[REDACTED PHONE] ).[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
trend similar to the change of the same variable in the neighboring area. If low
changes in a variable ’s value between two time stamps are surrounded by
low changes, then a Low-Low type of cluster is formed (i.e., cold spots). If a
low change in a variable ’s value between two time stamps for a speci ﬁc
location is accompanied by a high change of the same variable in the sur-
rounding area, there is negative spatial autocorrelation of the Low-High type.
A High-Low type of negative spatial autocorrelation emerges when high
values are surrounded by low values. In other words, the change in a location
does not follow a trend similar to that followed by the neighboring area.
Spatiotemporal outliers can also be traced in the case of High-Low or Low-
High formations.
Discussion and Practical Guidelines
Differential Moran ’sI is more descriptive in a spatiotemporal context than is
mapping the local Moran ’sIindex of a variable for each time stamp.
[REDACTED PHONE] Emerging Hot Spot Analysis
Deﬁnition
Emerging hot spot analysis identi ﬁes spatiotemporal clusters in a spatial
dataset (point counts or attribute values) using the Getis-Ord G∗
istatistic (hot
spot analysis; ESRI[REDACTED PHONE] ).
It (t – 1)
t (t – 1)
Figure 4.6 Differential Moran ’s I. The x-variable is the difference in the variable between
two time stamps. The y-variable is the spatial lag of this difference (weighted difference
calculated as the average difference for the nearby locations). A high value of the
statistic indicates that changes in the variable cluster over time (and space).[REDACTED PHONE] Space –Time Correlation Analysis

--- Page[REDACTED PHONE] ---
Why Use
A cluster may exist throughout the entire period, diminish after a speci ﬁc time
stamp, emerge after some other time stamp or disappear at some other point
in time. Emerging hot spot analysis is used to trace such types of different hot
spots or cold spots through time.
Interpretation
Emerging hot spot analysis groups locations according to the de ﬁnitions as
presented in Table 4.2 (ESRI[REDACTED PHONE] ).
Discussion and Practical Guidelines
Emerging hot spot analysis is very useful for locating ﬂuctuations in the density,
distribution and total count of events in temporal point data. For example,
crime events might concentrate on speci ﬁc locations during the day and on
other locations during the night. Through appropriate analysis, measures and
related policies may be implemented to better handle the problems being
studied.
4.6 Multiple Comparisons Problem and Spatial Dependence
Multiple Comparisons Problem
The multiple comparisons problem, also known as the multiple testing problem,
is the problem of getting false signi ﬁcant results in multiple hypothesis testing.
Local spatial statistics rely on tests conducted for every single spatial feature in
the dataset. As multiple inferences (tests) are drawn for the same set of spatial
features, there is a probability that some results will be declared statistically
signi ﬁcant by chance, something that should be controlled (Mitchell[REDACTED PHONE] ,C a l -
das de Castro & Singer[REDACTED PHONE] ,E S R I 2018c ). The multiple comparisons problem is
a Type I error (see Section[REDACTED PHONE] ). In this type of error, we reject the null
hypothesis when, in fact, it is true. In the context of geography, the more spatial
objects, the more likely it is that some will be misclassi ﬁed as statistically signi ﬁ-
cant when a hypothesis is tested.
For example, if we run a test to detect spatial outliers and a spatial feature
gets a p-value of[REDACTED PHONE], this spatial feature would be a spatial outlier based on
statistically signi ﬁcant results at the 95% con ﬁdence level. However, there
would be a 5% chance that this feature is not an outlier. When we run multiple
statistical tests for a few spatial features, the multiple comparisons problem is
not very severe. For many objects, however, the multiple comparisons problem
is signi ﬁcant. Detecting spatial outliers in 10,[REDACTED PHONE] spatial features infers 10,[REDACTED PHONE]
hypothesis tests and 10,[REDACTED PHONE] p-values (one test and one p-value per object). For
a 95% con ﬁdence level, the likelihood for each object to pass the test is 5%. In
other words, [REDACTED PHONE] objects might be found to be signi ﬁcant by chance, largely
altering the conclusions to be drawn[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Spatial Dependency
According to the ﬁrst law of geography, spatial entities that are closer tend to
be more similar than those lying further away. This is what we call spatial
dependency (see Section 1.3 ). In local spatial statistics, spatial dependency isTable 4.2 Emerging hot spot analysis groups.
Pattern Description when statistically signi ﬁcant
No pattern There is no indication for any hot or cold spots through the entire
study period
New hot spot This location is a new hot spot at the last time stamp. Before that,
there was no pattern in this location.
Consecutive hot
spotThis location is a hot spot for the ﬁnal time steps.
Intensifying hot
spotThis location is a hot spot for 90% of the time intervals including the
last step. In addition, in each time step, the intensity of clustering is
increasing.
Persistent hot
spotThis location is a hot spot for at least 90% of the time intervals but
with no notable ﬂuctuations in the intensity of clustering.
Diminishing hot
spotA location that is a hot spot for at least 90% of the time intervals
including the last one. The intensity of clustering is decreasing
overall.
Sporadic hot
spotA location that is a hot spot for less than 90% of the time intervals.
For none of the time intervals has this location been a statistically
signi ﬁcant cold spot.
Oscillating hot
spotA hot spot for the ﬁnal time step that has also been cold spot in
some other time intervals.
Historical hot
spotThis location is not a hot spot for the most recent time intervals.
Still, it has been traced as statistically signi ﬁcant hot spot for at least
90% of the past time intervals.
New cold spot This location is a new cold spot at the last time stamp. Before that,
there was no pattern in this location.
Consecutive cold
spotThis location is a cold spot at the ﬁnal time steps.
Intensifying cold
spotThis location is a cold spot for 90% of the time intervals including
the last step. In addition, in each time step, the intensity of
clustering is increasing.
Persistent cold
spotThis location is a cold spot for at least 90% of the time intervals but
with no notable ﬂuctuations in the intensity of clustering.
Diminishing cold
spotA location that is a cold spot for at least 90% of the time intervals
including the last one. The intensity of clustering is decreasing
overall.
Sporadic cold
spotA location that is a cold spot for less than 90% of the time intervals.
For none of the time intervals has this location been a statistically
signi ﬁcant hot spot.
Oscillating hot
spotA cold spot for the ﬁnal time step that has also been a hot spot in
some other time intervals
Historical hot
spotThis location is not a cold spot for the most recent time intervals.
Still, it has been traced as statistically signi ﬁcant cold spot for at
least 90% of the past time intervals[REDACTED PHONE].6 Multiple Comparisons Problem and Spatial Dependence

--- Page[REDACTED PHONE] ---
highly likely to seem more evident than it really is. The reason is that local
spatial statistics are calculated using the neighboring values of each spatial
feature (using a spatial weights matrix). However, features that are near each
other are likely to share common neighbors as well, leading to an overestimation
of spatial dependence and an arti ﬁcial in ﬂation of statistical signi ﬁcance.
Dealing with Multiple Comparisons Problem and
Spatial Dependence
Two approaches can be used to handle the multiple comparisons problem and
spatial dependence:
Bonferroni correction (Bonferroni[REDACTED PHONE] ): This correction divides the alpha
signi ﬁcance level by the number of tests (in spatial analysis this equals the
number of features). For example, for ten tests and alpha = [REDACTED PHONE], tests
with p-values smaller than[REDACTED PHONE]/10 = [REDACTED PHONE] are statistically signi ﬁcant. In
other words, the p-value at which a result is declared statistically signi ﬁ-
cant is stricter.
False discovery rate (FDR) correction: FDR correction has been particularly
inﬂuential in statistics and has been applied to other research areas as well –
e.g., genetics, biochemistry (Benjamini & Hochberg[REDACTED PHONE] , Benjamini[REDACTED PHONE] ).
False discovery rate correction is used to account for both spatial dependency
and the multiple comparisons problem. It lowers the p-value at which a
s t a t i s t i ci sr e g a r d e da ss i g n i ﬁcant. FDR correction estimates the number of
objects misclassi ﬁed (false positive error, rejects the null hypothesis) for a
given con ﬁdence level and then adjusts the critical p-value. Statistically signi ﬁ-
cant p-values (less than alpha) are ranked f rom smallest (strongest) to largest
(weakest). FDR calculates the expected error in rejecting the null hypothesis
(false positive) and, based on this estimate, the weakest objects are elimin-
ated. Within the spatial statistics context, applying FDR correction reduces
the number of features with statistically signi ﬁcant p-values.
Many statisticians recommend ignoring both the multiple comparisons prob-
lem and the spatial dependence problem. For a small number of spatial objects
(say, fewer than[REDACTED PHONE]), few objects are likely to be misclassi ﬁed, so correction
may not be necessary. As the number of objects increases, correction should
be considered. As software tools for applying corrections are readily available,
it is more rational to utilize them and then compare the results with the non-
corrected outputs. For example, applying FDR correction in hot spot analysis will
probably reduce the features assigned to clusters relative to hot spot analysis
without correction. It is advised to test which features are not included, along with
their attributes and their neighbors. Finally, we should keep in mind that, even
with corrections, we might still experience false results. The question of whether
the FDR or Bonferroni correction should be applied in the calculation of spatial
statistics depends on the problem, the knowledge of the study area, the intuitionof the researcher and the results produced with and without corrections[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
4.7 Chapter Concluding Remarks
/C15 Spatial autocorrelation is the degree of spatial dependency, association
or correlation between the value of an observation of a spatial entity and
the values of neighboring observations of the same variable.
/C15 A major difference with statistical correlation is that, while statistical
correlation refers to two distinct variables with no reference to location
(or for the same location), spatial autocorrelation refers to a value of a
single variable at a speci ﬁc location in relation to the values of the same
variable to its neighboring locations.
/C15 Lag-X is the weighted average values of Xin a speci ﬁed neighborhood.
/C15 There are four types of arrangement in a Moran ’sIscatter plot: High-High
and Low-Low, expressing positive spatial autocorrelation, and High-Low
and Low-High, indicating negative spatial autocorrelation.
/C15 Obtaining statistically signi ﬁcant results for the Local Moran ’sI(e.g.,
detecting clustering) does not mean that we will obtain statistically sig-
niﬁcant results for the Global Moran ’sIas well.
/C15 When calculating spatial autocorrelation, “global ”implies that a single
value for the index is produced for the entire pattern, while “local ”means
that a value is produced for each spatial object separately.
/C15 While Moran ’sIindex cannot distinguish between high- or low-value
clustering, the General G-Statistic can. On the other hand, the General
G-Statistic is appropriate only when there is positive spatial autocorrela-
tion, as it detects the presence of hot or cold spots.
/C15 General G-Statistic is more an index of spatial association, or an index of
positive spatial autocorrelation, than a pure spatial autocorrelation index.
/C15 In incremental spatial autocorrelation, by locating the peak in the graph,
thez-score and the corresponding distance, we can better de ﬁne the
distance band to be used in many spatial statistics such as hot spot
analysis.
/C15 More than one peak may occur. This is not wrong. Different distance
bands (peaks) might reveal underlying processes at different scales of
analysis.
/C15 Smaller distances are often more suitable for geographical analysis at the
local scale.
/C15 Before running incremental spatial autocorrelation, we should check if
locational outliers exist and remove them if necessary.
/C15 Local Moran ’sIis used to identify if clusters or outliers exist in the spatial
dataset. That is why this method is also called “cluster and outlier
analysis. ”
/C15 When calculating local spatial autocorrelation, permutations are used to
estimate the likelihood of generating, through complete spatial random-
ness, a spatial arrangement of values similar to the observed one[REDACTED PHONE].7 Chapter Concluding Remarks

--- Page[REDACTED PHONE] ---
/C15 Hot spot analysis cannot be used to locate outliers.
/C15 Hot spot analysis cannot be directly applied to point objects (e.g., crime
incidents) without any attribute ﬁelds attached. However, we can aggre-
gate data by using spatial join to polygons and then continue the analysis
in the usual fashion.
/C15 Optimized hot spot analysis is a procedure used to optimally select the
parameters of the Getis-Ord G∗
iindex.
/C15 It is easier to use optimized hot spot analysis instead of the Getis-Ord G∗
i
index, as long as we comprehend the outputs.
Questions and Answers
The answers given here are brief. For more thorough answers, refer back to the
relevant sections of this chapter.
Q1.What is spatial autocorrelation? What types of spatial autocorrelation
exist? Which are the most commonly used metrics?
A1. Spatial autocorrelation is the degree of spatial dependency, association
or correlation between the value of an observation of a spatial entity and
the values of neighboring observations of the same variable. There are
two types of spatial autocorrelation, namely global and local. Global
spatial autocorrelation measures autocorrelation by a single value for
the entire study area. To estimate spatial autocorrelation at the local
level, we use local measures of spatial autocorrelation. The most
common global spatial autocorrelation measures are the Moran ’sIindex
and the General G-Statistic. Local measures of spatial autocorrelation are
the Local Moran ’sIindex, the Getis-Ord Giand the Getis-Ord G∗
i
statistic.
Q2.Why is spatial autocorrelation important to geographical analysis and
spatial statistics?
A2. Spatial autocorrelation analysis is extremely important in geographical
studies. If spatial autocorrelation did not exist, geographical analysis
would be of little interest. In conventional statistics, the observed
samples are assumed to be independent. In the presence of spatial
autocorrelation, this assumption is violated. Observations are now
spatially clustered or dispersed. This typically means that classical statis-
tical tools are no longer valid. For example, linear regression would lead
to biased estimates or exaggerated precision. As such, spatial statistics
should be used instead.
Q3.What is incremental spatial autocorrelation, and why is it used?
A3. Incremental spatial is a method based on Global Moran ’sIindex to test
for the presence of spatial autocorrelation at a range of band distances. It
is used to approximate the appropriate scale of analysis. Instead of238 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
arbitrarily selecting distance bands, this method identi ﬁes an appropriate
ﬁxed distance band, for which spatial autocorrelation is more pro-
nounced. In other words, it allows us to identify the farthest distance at
which an object still has a signi ﬁcant impact on another one. After the
appropriate scale of analysis is established, local spatial autocorrelation
indices and other spatial statistics can be calculated more accurately.
Q4.Why is a Moran ’sIscatter plot used?
A4. It is used to visualize the spatial autocorrelation statistic. It allows for a
visual inspection of the spatial associations in the neighborhood of each
observation (data point). In other words, it provides a representation
used to assess how similar an attribute value at a location is to its
neighboring ones. The slope of the regression line over the data points
equals the Moran ’sIindex value when calculated using binary and row
standardized weights. The produced scatter plot identi ﬁes which type of
spatial autocorrelation exists according to the place where a dot lies (the
dot standing for a spatial entity, such as a polygon).
Q5.How can we identify a spatial outlier with a Moran ’s scatter plot? What
does a Low-High arrangement mean?
A5. We can identify a spatial outlier by inspecting a Moran ’sIscatter plot in
locations where High-Low or Low-High concentrations exist. A Low-High
arrangement means that a spatial object depicted as a dot in the scatter
plot has a low value (for the variable studied) and is surrounded by spatial
objects with high values. It is probably a spatial outlier, but further
analysis should be carried out to con ﬁrm it.
Q6.Why is it necessary to set up the right scale of analysis?
A6. The scale of analysis de ﬁnes the size and shape of the neighborhoods for
which spatial statistics are calculated. The scale of analysis is closely related
to the problem in question. Hypothetically, unemployment clustering statis-
tically signi ﬁcant at[REDACTED PHONE] m and 1,[REDACTED PHONE] m peaks re ﬂects patterns of clustering
at both the census-block level and the postcode level. If we are interested
only in the census-block level, we could apply the[REDACTED PHONE] m distance in our
analysis. Greater distances re ﬂect broader, regional trends (e.g., east to
west), while smaller distances re ﬂect local trends (e.g., between neighbor-
hoods). If we use a large scale of analysis, when we are looking at a local
level, we might generalize and lose hidden spatial heterogeneity.
Q7.What is cluster and outlier analysis? How can we interpret the results of
the index used?
A7. It is an analysis applying local Moran ’sIto (a) identify if a clustering of
high or low values exists and (b) to trace spatial outliers. If the p-value is
large (usually p>[REDACTED PHONE]), the results are not statistically signi ﬁcant. A small
p-value (usually <[REDACTED PHONE]) indicates that we can reject the null hypothesis of
complete spatial randomness and accept that spatial autocorrelation
exists. In this case, when z-value is positive, we have positive spatial
autocorrelation and clustering. If z-value is negative, we have negative239 Questions and Answers

--- Page[REDACTED PHONE] ---
spatial autocorrelation and a dispersed pattern of values. A high negative
value is an indication of a spatial outlier.
Q8.What is a High-High or Low-Low cluster in a Local Moran ’sI?
A8. A high positive z-score (e.g., greater than[REDACTED PHONE]) for a spatial entity means
that the neighboring spatial entities have similar values. If the values are
high, then High-High clusters are formed, meaning that spatial entities
with high values (for a speci ﬁc variable) are surrounded by spatial entities
of high values (of the same variable). If the values are low, then Low-Low
clusters are formed, meaning that spatial entities with low values are
surrounded by spatial entities of low values.
Q9.What is hot spot analysis, and what are a cold spot and a hot spot? Can
this analysis be used with point data?
A10. Hot spot analysis identi ﬁes if low or high values of a variable are spatially
clustered and create cold spots or hot spots respectively. In case of point
features, it might be more interesting to study the intensity of the objects
rather than a speci ﬁc attribute. In this respect, we use hot spot analysis to
identify if hot spots or cold spots of events ’intensity exist. Such analysis
should begin by aggregating points into some regions (e.g., postcodes,
census tracts).
Q10. What are the main bene ﬁts of using optimized hot spot analysis?
A10. Optimized hot spot analysis performs hot spot analysis using the Getis-
Ord G∗
iindex in an automated way to ensure optimal results. The method
is used to
(a) Identify how many locational outliers exist (if any).
(b) Estimate the distance band at which the spatial autocorrelation is
more pronounced (scale of analysis).
(c) Adjust for spatial dependence and multiple testing through the false
discovery rate correction method.
(d) Handle point events with no variables attached. These events are
automatically aggregated into weighted features within some
regions (i.e., grid). The weighted variable is then analyzed in the
usual fashion[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
LAB 4
SPATIAL AUTOCORRELATION
Overall Progress
Scope of the Analysis
This lab deals with
/C15 Objective 1: Locating high income areas (see Table 1.2)
/C15 Objective 2: Locating low crime areas
U
I
Moran’s I
Figure 4.7 Lab 4 work ﬂow and overall progress[REDACTED PHONE] Scope of the Analysis

--- Page[REDACTED PHONE] ---
We further analyze the income distribution in the city to identify whether
spatial clustering and spatial autocorrelation exist and also to locate income
hot spots through spatial statistics (see Figure 4.7 ). Moreover, we will study the
spatial autocorrelation patterns of crime by locating cold and hot spots.
Section A ArcGIS
Exercise 4.1 Global Spatial Autocorrelation
In this exercise, we calculate the global spatial autocorrelation of income
using the Moran ’sIindex and the Getis-Ord General G-Statistic.
ArcGIS Tools to be used: Spatial Autocorrelation (Moran ’sI ) ,
High/Low Clustering (Getis-Ord General G) )
ACTION: Calculate Global Moran ’sI
Navigate to the location you have stored the book dataset and
click on Lab4_SpatialAutocorrelation.mxd
Main Menu >File >Save As >My_Lab4_SpatialAutocorrelation.mxd
In I:\BookLabs\Lab4\Output
ArcToolBox >Spatial Statistics Tools >Analyzing Patterns >
Spatial Autocorrelation (Moran ’sI )
Input Feature Class = City (see Figure 4.8)
Input Field = Income
Generate Report = Check the box
Conceptualization of Spatial Relationships = INVERSE_DISTANCE
(See Chapter 1 for theory.)
Distance: EUCLIDEAN_DISTANCE
Standardization = ROW (See Chapter 1 for theory. We should use
ROW when we have polygons and data aggregated at this level.)
Distance Band or Threshold Distance = Leave blank. This is acutoff distance for Inverse Distance and Fixed Distance con-
ceptualization methods. Features outside the specified cutoff
value for a target feature are ignored. The tool uses by default
the distance ensuring that each single feature has at least one242 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.1 (cont. )
neighbor. This distance is not necessarily the appropriate one.
We can use this value to begin with and progressively increase
it to test how values of the index will vary. See Chapter 1 for
theory.
OK
Main Menu >Geoprocessing >Results >Current Session >Spatial
Autocorrelation >DC on MoransI_Result.html (see Figure 4.9)
Interpreting results: The Moran ’sIIndex is[REDACTED PHONE] (see Figure 4.9 ). Given the
z-score of[REDACTED PHONE] and the p-value of[REDACTED PHONE], there is a less than 1%
likelihood (signi ﬁcance level α) that this clustered pattern is the result of
random chance. In other words, we have a 99% probability (con ﬁdence
level) that the distribution of income forms a clustered pattern. A slightly
different way to interpret the results is as follows: The spatial arrange-
ment of the income values has a tendency to cluster, and there is a
likelihood of less than 1% that this pa ttern is the result of random chance.
The distance threshold de ﬁned by the tool is set to[REDACTED PHONE] m, so every
postcode has at least one neighbor. Global Moran ’sp r o v i d e sa ﬁrst
indication of income clustering. However, we cannot locate where the
clustering occurs just by using this index. Moreover, although we calcu-
lated spatial autocorrelation, we have not yet de ﬁned the appropriate
scale of analysis.
Figure 4.8 Global Moran ’sItool[REDACTED PHONE] Exercise 4.1 Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.1 (cont. )
ACTION: Calculate High/Low Clustering (Getis-Ord General G)
ArcToolBox >Spatial Statistics Tools >Analyzing Patterns >
High/Low Clustering (Getis-Ord General G)
Input Feature Class = City (see Figure[REDACTED PHONE])
Input Field = Income
Generate Report = Check the box
Conceptualization of Spatial Relationships = INVERSE_DISTANCE(See Chapter 1 for theory)
z
pp z
Figure 4.9 Global Moran ’sIreport[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.1 (cont. )
Distance: EUCLIDEAN_DISTANCE
Standardization = ROW
Distance Band or Threshold Distance = Leave blank
OK
Main Menu >Geoprocessing >Results >Current Session >High/
Low Clustering (Getis-Ord General G) >DC on GeneralG_Results
.html (see Figure[REDACTED PHONE])
Main Menu >File >Save
Interpreting results: The Getis-Ord General G Index is[REDACTED PHONE] (see
Figure[REDACTED PHONE] ). Given the z-score of[REDACTED PHONE] (positive value) and the p-value of[REDACTED PHONE], there is a less than 1% likelihood (signi ﬁcance level α) that this
clustered pattern of high values is the result of random chance. In other
words, we have a 99% probability (con ﬁdence level) that the distribution of
income forms a clustered pattern of high values. As does the Global Moran ’s
I, the Getis-Ord General G provides a ﬁrst indication of income clustering.
Figure[REDACTED PHONE] Getis-Ord General G tool[REDACTED PHONE] Exercise 4.1 Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.1 (cont. )
However, we cannot locate where the clustering occurs just by using this
index. Moreover, although we calculated spatial autocorrelation, we have
not yet de ﬁned the appropriate scale of analysis.
Exercise 4.2 Incremental Spatial Autocorrelation and Spatial Weights Matrix
In this exercise, we calculate the incremental spatial autocorrelation of
income to de ﬁne an appropriate scale of analysis. Based on this scale, the
spatial weights matrix (see Section 1.9 ) is calculated, which is needed for
local spatial statistics.
z
pp z
Figure[REDACTED PHONE] Getis-Ord General G report[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.2 (cont. )
ArcGIS Tools to be used: Incremental Spatial Autocorrelation ,Gen-
erate Spatial Weights Matrix ,Convert Spatial Weights Matrix to
Table
ACTION: Incremental Spatial Autocorrelation
Navigate to the location you have stored the book dataset and
click
My_Lab4_SpatialAutocorrelation.mxd
ArcToolBox >Spatial Statistics Tools >Analyzing Patterns >
Incremental Spatial Autocorrelation
Input Features = City (see Figure[REDACTED PHONE])
Input Field = Income
Number of distance bands = 10
Beginning Distance = Leave blank
Distance Increment = Leave blank
Distance = EUCLIDEAN
Row Standardization = Check the box
Figure[REDACTED PHONE] Incremental spatial autocorrelation dialog box[REDACTED PHONE] Exercise 4.2 Incremental Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.2 (cont. )
Output Table = I:\BookLabs\Lab4\Output\Increment
Output Report File = I:\BookLabs\Lab4\Output\Increment.pdf
OK
Main Menu >Geoprocessing >Results >Current Session >
Incremental Spatial Autocorrelation >DC on Output Report File:
Increment.pdf
Global Moran ’s I Summary by Distance
Global Moran's I Summary by Distance
Distance Moran's Index Expected Index Variance z-score p-value[REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] -[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]
First Peak (Distance, Value): [REDACTED PHONE], [REDACTED PHONE]
Max Peak (Distance, Value): [REDACTED PHONE], [REDACTED PHONE]
Distance measured in Meters
Interpreting results: Prior to incremental spatial autocorrelation, we should
trace if locational outliers exist. We conducted this analysis in Exercise 3.4
and concluded that no locational outliers existed (the theoretical discussion
inSection 4.3 explains how to handle locational outliers).
We observe that there are two peaks (see Figure[REDACTED PHONE] ): one at 1,[REDACTED PHONE] m and
one at 1,[REDACTED PHONE] m. The Moran ’sIvalues for these distances are[REDACTED PHONE] and[REDACTED PHONE],
respectively (with high z-scores), indicating intense clustering. Both dis-
tances reveal a form of clustering. As mentioned in the theoretical section,there is not a single correct distance at which to perform our analysis, as the248 Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.2 (cont. )
scale largely depends on our problem. It is quite common to select the ﬁrst
peak. As such, the scale of analysis of income can be set to 1,[REDACTED PHONE] m
(rounded[REDACTED PHONE]). The overall conclusion is that there is spatial autocorre-
lation of income and an underlying clustering process.
ACTION: Generate Spatial Weights Matrix
ArcToolBox >Spatial Statistics Tools >Modeling Spatial
Relationships >Generate Spatial Weights Matrix
Input Feature Class = City (Navigate to I:\BookLabs\Data\City.
shp) (see Figure[REDACTED PHONE])
Unique ID Field = PostCode
Output Spatial Weights Matrix File =
I:\BookLabs\Lab4\Output\CityWeights.swm
Conceptualization of Spatial Relationships = FIXED_
DISTANCE
Distance Method = EUCLIDEAN
Figure[REDACTED PHONE] Incremental spatial autocorrelation graph. z-scores are plotted over
incremental distances. Peaks are highlighted with a larger circle[REDACTED PHONE] Exercise 4.2 Incremental Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.2 (cont. )
Exponent = 1
Threshold Distance = [REDACTED PHONE] (as produced from incremental spatial
autocorrelation in exercise 4.1)
Number of Neighbors = 3
Row Standardization = Check
OK
ArcGIS tip: The Number of neighbors parameter (see Figure[REDACTED PHONE] ) is avail-
able only from the Generate Spatial Weights Matrix tool. The k-nearest
neighbors option is also used in exploratory regression (analyzed in Chap-
ter 6 ) to assess regression residuals. It takes a default value of 8.
Interpreting results: We set FIXED_DISTANCE as the function with which to
conceptualize space, as this method is more appropriate for hot spot analy-
sis (see Figure[REDACTED PHONE] ). The Threshold Distance is set to[REDACTED PHONE] as the appro-
priate scale of analysis and is the result of the ﬁrst part of this exercise (scale
of analysis through incremental spatial autocorrelation). Objects lying fur-
ther away than this distance will not be included in the calculation of the
weights function. As we set a cutoff value for the FIXED_DISTANCE , some
features may have no neighbors at this distance. To calculate the weights
Figure[REDACTED PHONE] Generate spatial weights matrix dialog box[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.2 (cont. )
matrix, however, we should have at least some minimum number of neigh-
bors for all features. To ensure that each object has at least a minimum
number of neighbors, we use the parameter Number of Neighbors . By this
combination (threshold and number of neighbors), features with no neigh-
bors (or fewer than three) inside the threshold value will ﬁnally be attached
to their neighbors. In other words, the threshold value is temporarily
extended to ensure that each feature will have at least a minimum number
of neighbors de ﬁned. In general, a spatial weights matrix is automatically
generated when we apply spatial statistics by de ﬁning a conceptualization
method. Nevertheless, to have more control over the weights, it is recom-
mended to create a user-de ﬁned spatial weights matrix that can be applied
thereafter. If uniformity or the isotropic environment is violated in a part of
our case study, we might need to change the weights. For instance, two
objects might have large weights, indicating high interaction and small
distance. Due to a natural barrier (e.g., river, lake, island polygons), these
objects might be close, but their interaction might be low. In such a case, we
could edit the weight matrix accordingly.
ACTION: Convert Spatial Weights Matrix to Table
A typical spatial weights matrix in ArcGIS has three columns (see Figure[REDACTED PHONE] ):
ID (unique ID of the spatial object), NID (the ID of the neighboring object
with which there is a relationship) and Weight (the value of the weight that
quanti ﬁes the spatial relationship). Nonexistent spatial relationships (weight
= 0) are not included in the matrix to keep the table short. The output ﬁle is
in a unreadable format. To read and edit the weights for each set of spatial
objects, we must convert the .swm ﬁle into a table using the Convert Spatial
Weights to Table tool.
ArcToolBox >Spatial Statistics Tools >Utilities >Convert
Spatial Weights Matrix to Table
Input Spatial Weights Matrix File =
I:\BookLabs\Lab4\Output\CityWeights.swm
Output Table = I:\BookLabs\Lab4\Output\CityWeights
OK
TOC >List By Source >RC CityWeitghs >Open
Close table
Main Menu >File >Save251 Exercise 4.2 Incremental Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.2 (cont. )
Interpreting results: NID is the ID of the neighboring object, and WEIGHT
is the calculated weight (see Figure[REDACTED PHONE]). For example, postcode[REDACTED PHONE] has four neighbors[REDACTED PHONE],[REDACTED PHONE],[REDACTED PHONE],[REDACTED PHONE]) within a ﬁxed dis-
tance of 1,[REDACTED PHONE] m (the distance between the polygon centroids); that is why
the weight is[REDACTED PHONE] on each.
Exercise 4.3 Cluster and Outlier Analysis (Anselin Local Moran ’sI)
In this exercise, we calculate the local spatial autocorrelation of income
using Local Moran ’sIto identify if clusters and outliers exist.
ArcGIS Tools to be used: Cluster and Outlier Analysis
ACTION: Cluster and Outlier Analysis
Navigate to the location you have stored the book dataset and
click
Figure[REDACTED PHONE] Spatial weights table[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.3 (cont. )
My_Lab4_ SpatialAutocorrelation.mxd
ArcToolBox >Spatial Statistics Tools >Mapping Clusters >
Cluster and Outlier Analysis
Input Feature Class = City (see Figure[REDACTED PHONE])
Input Field = Income
Output Feature Class = I:\BookLabs\Lab4\Output\LocalMoranI.shp
Conceptualization of Spatial Relationships =
GET_SPATIAL_WEIGHTS_FROM_FILE
Weights Matrix File = I:\BookLabs\Lab4\Output\CityWeights.swm
Apply False Discovery Rate (FDR) Correction = Check
OK
TOC >RC LocalMoranI >Open Attribute Table
Close Table
Main Menu >File >Save
Figure[REDACTED PHONE] Local Moran ’sIdialog box[REDACTED PHONE] Exercise 4.3 Cluster and Outlier Analysis

--- Page[REDACTED PHONE] ---
Exercise 4.3 (cont. )
Figure[REDACTED PHONE] Local Moran ’sIoutput map.
Figure[REDACTED PHONE] Local Moran ’sItable with local Moran ’sIindex value, z-score, p-value and
type of cluster assigned to each object[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.3 (cont. )
Interpreting results: A new layer is added to the table of contents (see
Figure[REDACTED PHONE] ). The COType ﬁeld in the Output Feature Class indicates if a
postcode is an outlier or if it belongs to a cluster (see Figure[REDACTED PHONE] ). If the
postcode has high income and is surrounded by postcodes with low
incomes, it is marked as HL. If the postcode has low income and is sur-
rounded by postcodes with high income, it is marked as LH. If postcodes are
clustered, the COType ﬁeld is HH for a statistically signi ﬁcant cluster of high-
income values and LL for a statistically signi ﬁcant cluster of low-income
values. The attribute table also shows the local Moran ’sIvalue, the z-score
and the p-value. In this example and with FDR applied, income is positively
spatially autocorrelated, and a statistically signi ﬁcant clustering of High-High
values is observed in the center of the city at the 99% con ﬁdence level. No
outliers or clusters of low values are detected elsewhere. In other words,
people with high incomes tend to live in the red areas located in and around
the downtown area.
Exercise 4.4 Hot Spot Analysis (Getis-Ord Gi*I) and Optimized Hot Spot Analysis
In this exercise, we calculate local spatial autocorrelation to identify income
hot spots and cold spots using the local Getis-Ord G∗
iindex.
ArcGIS Tools to be used: Hot Spot Analysis ,Optimized Hot Spot
Analysis
ACTION: Hot Spot Analysis
Navigate to the location you have stored the book dataset and
click
My_Lab4_ SpatialAutocorrelation.mxd
ArcToolBox >Spatial Statistics Tools >Mapping Clusters >Hot
Spot Analysis
Input Feature Class = City (see Figure[REDACTED PHONE])
Input Field = Income
Output Feature Class = I:\BookLabs\Lab4\Output\HotSpotIncome.shp
Conceptualization of Spatial Relationships =[REDACTED PHONE] Exercise 4.4 Hot Spot Analysis and Optimized Hot Spot Analysis

--- Page[REDACTED PHONE] ---
Exercise 4.4 (cont. )
GET_SPATIAL_WEIGHTS_FROM_FILE
Self Potential Filed = Leave blanc
Weights Matrix File = I:\BookLabs\Lab4\Output\CityWeights.swm
(We can also directly use the FIXED_DISTANCE_BAND in the
conceptualization method and add the distance threshold. Still,this option does not allow for specifying a minimum number of
nearest neighbors. It is advised to use a spatial weights matrix
from file –see Exercise 4.2 ).
Apply False Discovery Rate (FDR) Correction = Do not check(Check what happens when FDR is checked and refer back to
theory)
OK
Interpreting results: A new layer is added to the table of contents (see
Figure[REDACTED PHONE] ). The Gi_Bin ﬁeld in the Output Feature Class indicates whether
there is a hot spot or a cold spot and the related con ﬁdence level. In our
example, we locate a statistically signi ﬁcant hot spot in and around the city
center and a statistically signi ﬁcant cold spot in the western part of the city.
Figure[REDACTED PHONE] Hot spot analysis dialog box[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.4 (cont. )
Nonsigni ﬁcant results mean that there is no indication of income clustering in
these postcodes. An income cold spot means that polygons with low income
values are surrounded by polygons with low income values. An income hot spot
means that polygons with high income values are surrounded by polygons with
high values. We notice that the hot spot analysis identi ﬁes a cluster of low
values in addition to those identi ﬁed in the cluster and outlier analysis in
Exercise 4.3. This means that it is better to run both statistics and evaluate
the results comparatively. As we are looking for high-income areas, the red
areas might be more appropriate as locations for the coffee shop.
ACTION: Optimized Hot Spot Analysis
ArcToolBox >Spatial Statistics Tools >Mapping Clusters >
Optimized Hot Spot Analysis
Figure[REDACTED PHONE] Hot spot analysis output map indicating cold (blue) and hot (red) spots[REDACTED PHONE] Exercise 4.4 Hot Spot Analysis and Optimized Hot Spot Analysis

--- Page[REDACTED PHONE] ---
Exercise 4.4 (cont. )
Input Features = City (see Figure[REDACTED PHONE])
Output Features =
I:\BookLabs\Lab4\Output\C3_IncomeOpitmizedHotSot.shp
Analysis Field = Income
OK
********************Initial Data Assessment********************
Making sure there are enough weighted features for analysis ....
- There are 90 valid input features.
Evaluating the Analysis Field values ....
- INCOME Properties:
Min: [REDACTED PHONE]
Max: [REDACTED PHONE]
Mean: [REDACTED PHONE]
Std. Dev.: [REDACTED PHONE]
Looking for locational outliers ....
- There were no outlier locations found.
Figure[REDACTED PHONE] Optimized hot spot analysis dialog box[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.4 (cont. )
**********************Scale of Analysis***********************
Looking for an optimal scale of analysis by assessing the
intensity of clustering at increasing distances ....
- The optimal fixed distance band is based on peak clustering
found at[REDACTED PHONE] Meters
**********************Hot Spot Analysis************************
Finding statistically significant clusters of high and low
INCOME values ....
- There are 38 output features statistically significant based
on an FDR correction for multiple testing and spatial dependence.
***************************Output*****************************
Creating output feature class:
I:\BookLabs\Lab4\Output\C3_IncomeOptimizedHotSpot.shp
- Red output features represent hot spots where high INCOME
values cluster.
- Blue output features represent cold spots where low INCOME
values cluster.
The above results can be also found through:
Main Menu >Geoprocessing >Results >Current Session >
Optimized Hot Spot Analysis >Messages
Close Results
Main Menu >File >Save259 Exercise 4.4 Hot Spot Analysis and Optimized Hot Spot Analysis

--- Page[REDACTED PHONE] ---
Exercise 4.4 (cont. )
Interpreting results: Optimized hot spot analysis runs the entire procedure
in an automated way, so it saves signi ﬁcant analysis time. The results are
similar to those of the hot spot analysis (because the hot spot analysis used a
similar procedure but in a non-automated way; see Figure[REDACTED PHONE] ). The tool
reports the scale of analysis (1,[REDACTED PHONE] m) as well as the presence (or not) of
locational outliers. Hot spots of income are potential areas for the location
of the new coffee shop (see Box 4.1 ).
Figure[REDACTED PHONE] Optimized hot spot analysis output map[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.4 (cont. )
Box 4.1 Analysis Criterion C3 to Be Used in Synthesis Lab 5.4: The
location of the coffee shop should lie within income hot spots areas.
[C3_IncomeHotSpot.lyr ]
TOC >RC C3_IncomeOptimizedHotSpot >Save As Layer File >
C3_IncomeHotSpot.lyr >Save
Exercise 4.5 Optimized Hot Spot Analysis for Crime Events
In this exercise, we perform optimized hot spot analysis of[REDACTED PHONE] crime events
for a period of two years to identify hot spots and cold spots of crime
incidents (see Figure[REDACTED PHONE] ).
ArcGIS Tools to be used: Optimized Hot Spot Analysis, Hot spot
analysis
ACTION: Hot Spot Analysis
Navigate to the location you have stored the book dataset and
click
My_Lab4_ SpatialAutocorrelation.mxd
TOC >List By Drawing Order >Drag Crime.shp on the top of all
layers.
ArcToolBox >Spatial Statistics Tools >Mapping Clusters >
Optimized Hot Spot Analysis
Input Feature Class = Crime (see Figure[REDACTED PHONE])
Output Feature Class =
I:\BookLabs\Lab4\Output\C4_CrimeOptimizedHotSpot.shp
Leave other blank / default
OK261 Exercise 4.5 Optimized Hot Spot Analysis for Crime Events

--- Page[REDACTED PHONE] ---
Exercise 4.5 (cont. )
Figure[REDACTED PHONE] Optimized hot spot analysis dialog box.
********************Initial Data Assessment********************
Making sure there are enough incidents for analysis ....
- There are[REDACTED PHONE] valid input features.
Looking for locational outliers ....
- There were 5 outlier locations; these will not be used to
compute the polygon cell size.
**********************Incident Aggregation*********************
Creating fishnet polygon mesh to use for aggregating in-
cidents ....
- Using a polygon cell size of[REDACTED PHONE] Meters
Counting the number of incidents in each polygon cell ....
- Analysis is performed on all polygon cells containing at
least one incident[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.5 (cont. )
Evaluating incident counts and number of polygons ....
- The aggregation process resulted in[REDACTED PHONE] weighted polygons.
- Incident Count Properties:
Min: [REDACTED PHONE]
Max: [REDACTED PHONE]
Mean: [REDACTED PHONE]
Std. Dev.: [REDACTED PHONE]
***********************Scale of Analysis***********************
Looking for an optimal scale of analysis by assessing the
intensity of clustering at increasing distances ....
- The optimal fixed distance band is based on peak clustering
found at[REDACTED PHONE] Meters
************************Hot Spot Analysis**********************
Finding statistically significant clusters of high and low
incident counts ....
- There are 82 output features statistically significant
based on an FDR correction for multiple testing and spatial
dependence.
***************************Output*****************************
Creating output feature class:
I:\BookLabs\Lab4\Output\C4_CrimeOptimizedHotSpot.shp
- Red output features represent hot spots where high incident
counts cluster.
- Blue output features represent cold spots where low
incident counts cluster[REDACTED PHONE] Exercise 4.5 Optimized Hot Spot Analysis for Crime Events

--- Page[REDACTED PHONE] ---
Exercise 4.5 (cont. )
The above results can be found through:
Main Menu >Geoprocessing >Results >Current Session >
Optimized Hot Spot Analysis >Messages
Close Results
Main Menu >File >Save
Figure[REDACTED PHONE] Optimized hot spot analysis dialog box[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.5 (cont. )
Interpreting results: Optimized hot spot analysis for point entities aggre-
gates events into polygon cells (see Figure[REDACTED PHONE] ). To calculate the polygon
cell size, ﬁve locational outliers are removed. Finally, crime events are
aggregated on polygon cells with a size of[REDACTED PHONE] m. The optimal ﬁxed
distance is identi ﬁed at[REDACTED PHONE] m, and FDR correction is applied. Crime events
are scattered all over the study area, but one cold spot and three hot spots
of crime are identi ﬁed, as shown on the map (see Figure[REDACTED PHONE] ). The dis-
tance[REDACTED PHONE] m) at which autocorrelation is more pronounced reveals that the
hot spots are quite large (relative to the case study area ’s size) and that
crime is a signi ﬁcant problem in three regions of the city. These hot spots
also re ﬂect the center of the real clusters of crime, and, in this sense, crime
might be evident in polygons adjacent to the hot spot polygons as well.
Crime hot spots should be excluded as candidates for the new coffee
shop ’sl o c a t i o n( s e e Box 4.2 ).
Box 4.2 Analysis Criterion C4 to Be Used in Synthesis Lab 5.4: The
location of the coffee shop should not lie within crime hot spots areas.
[C4_CrimeHotSpot.lyr ]
TOC >RC C4_CrimeOptimizedHotSpot >Save As Layer File >
C4_CrimeHotSpot.lyr >Save
Section B GeoDa
Exercises 4.1 and 4.2 Global Spatial Autocorrelation and Spatial Weights Matrix
In this exercise, we calculate the global spatial autocorrelation of income
using the Moran ’sIindex and the Getis-Ord General G-Statistic. Before
doing so, we should create the spatial weights matrix. Unlike the exercises
inSection A, Exercises 4.1 and4.2are presented in reverse order because
GeoDa requires that the spatial weights be created by the user, while
ArcGIS allows for automatic calculation when the spatial autocorrelation
tools are executed[REDACTED PHONE] Exercises 4.1 and 4.2 Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercises 4.1 and 4.2 (cont. )
GeoDa Tools to be used: Weights Manager ,Univariate Moran ’sI
ACTION: Calculate Global Moran ’sI
Navigate to the location you have stored the book dataset and
click the Lab4_SpatialAutocorrelation_GeoDa.gda
Main Menu >Tools >Weights Manager >Create >
Select ID Variable = PostCode (see Figure[REDACTED PHONE])
TAB = Distance Weight
TAB = Distance band >Specify bandwidth >Leave default
value:[REDACTED PHONE]
Check the “Use inverse distance ”. Set Power to 1.
Create
File name = CityGeoDa (see Figure[REDACTED PHONE])
Figure[REDACTED PHONE] Calculating spatial weights dialog box[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercises 4.1 and 4.2 (cont. )
Save as type = gwt (inside folder Lab4/GeoDa)
The weights manager dialog box is updated
Save >Close >Close Weights Manager window
ACTION: Calculate Global Moran ’sI
Main Menu >Space >Univariate Moran ’sI >
First Variable (X) = Income (see Figure[REDACTED PHONE])
Weights = GityGeoDa
OK
Figure[REDACTED PHONE] Weights manager showing the CityGeoDa spatial weights ﬁle[REDACTED PHONE] Exercises 4.1 and 4.2 Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercises 4.1 and 4.2 (cont. )
Figure[REDACTED PHONE] Setting the variable and weights ﬁle for Moran ’sI.
Figure[REDACTED PHONE] Moran ’sIscatter plot with a Moran ’sIindex of[REDACTED PHONE]. Graph reveals high
positive autocorrelation[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercises 4.1 and 4.2 (cont. )
You can save the graph as image file if you wish.
Permutations are used to estimate how likely it is that a spatial arrange-
ment of values similar to that we observe would be produced through
complete spatial randomness. We use Monte Carlo and the following
procedure.
RC on the scatter plot >Randomization >[REDACTED PHONE] permutations (see
Figure[REDACTED PHONE])
Interpreting results: The results of this example suggest that the observed
value I= [REDACTED PHONE] is highly signi ﬁcant and not a result of spatial randomness,
as it lies far away from the rest of the values (see Figure[REDACTED PHONE]) ( z-score =
[REDACTED PHONE]; the green vertical line at the right depicts the Moran ’sIstatistic
value). The rest of the values depict the distribution that results from com-
plete spatial randomness. The number of permutations, the setting of the
pseudo p-value signi ﬁcance level, the expected (theoretical) Moran ’sIvalue
E(I), the observed Moran ’sI(I), the standard deviation, the z-value and the
mean of the reference distribution are also presented. With[REDACTED PHONE] permuta-
tions, the pseudo p-value is set to[REDACTED PHONE], indicating that none of the[REDACTED PHONE] random patterns ’local Moran ’sIvalues surpassed the observed value.
As such, the spatial arrangement of the income values has a tendency to
cluster. The distance threshold de ﬁned by the tool is set to 1,[REDACTED PHONE] m so
that every postcode has at least one neighbor. This is a ﬁrst indication of
income clustering. However, we cannot locate where the clustering occurs
just by using this index. Moreover, although we calculated spatial autocor-
relation, we have not yet de ﬁned the appropriate scale of analysis. GeoDa
does not offer an automated tool for incremental spatial autocorrelation as
ArcGIS does; this analysis is therefore not carried out here. As the scale of
analysis, we will use the outcome of the incremental spatial autocorrelationas presented in Exercise 4.2 in Section A , which is[REDACTED PHONE] m.
Figure[REDACTED PHONE] Mont Carlo simulation with[REDACTED PHONE] permutations[REDACTED PHONE] Exercises 4.1 and 4.2 Global Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercises 4.1 and 4.2 (cont. )
Tip: Using the random seed for permutations might cause the results to
differ slightly among different computers or sometimes even when using the
same machine.
Exercise 4.3 Cluster and Outlier Analysis (Anselin Local Moran ’sI)
In this exercise, we calculate the local spatial autocorrelation of income
using Local Moran ’sIto identify clusters and outliers. As mentioned in the
previous exercise, the scale of analysis is 1,150m. Before we calculate the
Local Moran ’sI, we should recalculate the spatial weights to re ﬂect the
adopted scale of analysis.
GeoDa Tools to be used: Weights Manager ,Univariate Moran ’sI,
Moran ’s scatter plot
Permutations = [REDACTED PHONE]
pseudo p-value = [REDACTED PHONE]
E[I] = –[REDACTED PHONE]
I = [REDACTED PHONE]
SD = [REDACTED PHONE]
z-value = [REDACTED PHONE]
mean = –0.0058E[I] = –[REDACTED PHONE] I = [REDACTED PHONE]
Figure[REDACTED PHONE] Mont Carlo reference distribution for[REDACTED PHONE] permutations[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.3 (cont. )
ACTION: Weights Manager
Navigate to the location you have stored the book dataset and
click Lab4_SpatialAutocorrelation_GeoDa.gda
Main Menu >Tools >Weights Manager >Create
Select ID Variable = PostCode
TAB = Distance Weight
TAB = Distance band >Specify bandwidth = [REDACTED PHONE]
Check “Use inverse distance ”. Set Power to 1.
Create
File name = CityGeoDa1150
Save as type = gwt (inside folder Lab4/GeoDa)
Save >OK>Close >Close Weights Manager window
ACTION: Cluster and Outlier Analysis
Main Menu >Space >Univariate Local Moran ’sI >Income >
Weights = CityGeoDa1150 >OK
Check: Significance Map
Check: Cluster Map
Check: Moran Scatter Plot
OK
Save Project (see Figure[REDACTED PHONE])
Interpreting results: If a postcode has high income and is surrounded by
postcodes with low income, it is marked as High-Low. If the postcode has
low income and is surrounded by postcodes with high income, it is marked
as Low-High. Where postcodes are clustered, they are labeled as High-High
for a statistically signi ﬁcant cluster of high-income values and Low-Low for a
statistically signi ﬁcant cluster of low-income values.
In this example and without FDR correction, income is positively spatially
autocorrelated, and a statistically signi ﬁcant clustering of high values is
observed in the center of the city at the 99% con ﬁdence level. In other words,
people with high incomes tend to live in the red areas located in and around271 Exercise 4.3 Cluster and Outlier Analysis

--- Page[REDACTED PHONE] ---
Exercise 4.3 (cont. )
the downtown area. A cluster of low values is detected in the western parts of
the city (this cluster is not identi ﬁed with ArcGIS due to FDR corrections). One
outlier of Low-High values and ﬁve outliers of High-Low values are also
located in the study area.
Figure[REDACTED PHONE] Clusters and signi ﬁcance map for three levels of signi ﬁcance[REDACTED PHONE], [REDACTED PHONE],
[REDACTED PHONE]). Unlike with the ArcGIS output in Figure[REDACTED PHONE] , we have not applied FDR
correction, and more postcodes are thus statistically signi ﬁcant. FDR can be applied
manually in GeoDa[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
Exercise 4.4 Hot Spot Analysis (Getis-Ord Gi*I)
In this exercise, we calculate the local spatial autocorrelation of income
using the local Getis-Ord G∗
iindex to identify hot spots and cold spots
(optimized hot spot analysis is not carried out as GeoDa does not offer such
a tool).
GeoDa Tools to be used: Local G*
ACTION: Local G*
Navigate to the location you have stored the book dataset and
click Lab4_SpatialAutocorrelation_GeoDa.gda
Main Menu >Space >Local G* >Income >Weights = CityGeoDa1150
>OK
Figure[REDACTED PHONE] Hot spot analysis output map indicating cold (blue) and hot (red) spots
signi ﬁcant at the p<= [REDACTED PHONE] level. There are minor differences from the results shown
inFigure[REDACTED PHONE] using ArcGIS due to the slightly different weights matrix used.
However, the main conclusions regarding the presence of hot and cold spots remain
unchanged[REDACTED PHONE] Exercise 4.4 Hot Spot Analysis (Getis-Ord Gi*I)

--- Page[REDACTED PHONE] ---
Exercise 4.4 (cont. )
Check: Cluster Map
Check: using row-standardized weights
Save project
Interpreting results: We locate a statistically signi ﬁcant hot spot in and around
the city center and a statistically signi ﬁcant cold spot in the western part of the
city (see Figure[REDACTED PHONE]). Nonsigni ﬁcant results mean that there is no indication of
income clustering for these postcodes. A cold spot of income means that
polygons with low values of income are surrounded by polygons with low
values of income. A hot spot of income means that polygons with high
values of income are surrounded by polygons with high values. As we are
looking for areas with high income, the red areas might be more appropri-
ate as locations for the coffee shop.
Remark : Optimized hot spot analysis is not offered in GeoDa, and Exer-
cise 4.5 is presented only in Section A[REDACTED PHONE] Spatial Autocorrelation

--- Page[REDACTED PHONE] ---
5 Multivariate Data in Geography
Data Reduction and Clustering
Learning Objectives
This chapter deals with multivariate statistical methods for data reduction and
clustering, commonly used in geographical analysis, such as
/C15 Principal component analysis
/C15 Factor analysis
/C15 Multidimensional scaling
/C15 Hierarchical clustering
/C15 k-means clustering
/C15 Regionalization (SKATER, REDCAP)
/C15 Density-based clustering (DBSACN, HDBSCAN, OPTICS)
/C15 Similarity analysis (Cosine similarity)
After a thorough study of the theory and lab sections, you will be able to
/C15 Understand why multivariate data and statistics are essential in geo-
graphical analysis as for example, in geodemographics
/C15 Understand that observations in multivariate datasets are points in a
multidimensional data space
/C15 Understand what principal components are and how they can be mapped
in a GIS environment
/C15 Map multidimensional datasets to a 2-d or 3-d representation by multidi-
mensional scaling
/C15 Understand why hierarchical clustering is important to identify the struc-
ture of clusters
/C15 Use the k-means algorithm in a geographical problem
/C15 Evaluate the importance of taking into account spatial constraints in
clustering (regionalization)
/C15 Use density-based clustering to analyze large datasets of point entities
/C15 Apply similarity analysis to identify common characteristics (pro ﬁles) on
your spatial entities
/C15 Perform Principal Component Analysis, Multidimensional Scaling and
Hierarchical clustering in Matlab[REDACTED PHONE]

--- Page[REDACTED PHONE] ---
/C15 Conduct k-means clustering, similarity analysis, and spatial clustering in
ArcGIS
/C15 Conduct k-means clustering and spatial clustering in GeoDa
5.1 Multivariate Data Analysis
Deﬁnitions
Multivariate data are data with more than two values recorded for each
observation (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). A typical representation of a
multivariate dataset A, with nobservations and pvariables is through a n/C2p
matrix ( 5.1):
A¼a1,1/C1/C1/C1 a1,p
.........
an,1/C1/C1/C1 an,p2[REDACTED PHONE] (5.1)
where columns represent the pvariables and lines represent the n
observations.
Multivariate data exist in a multidimensional space where the number of
dimensions equals the number of variables. For example, a case study area with
64 spatial units (e.g., postcodes) and 50 variables is a p= 50-dimensional
dataset consisting of n= 64 observations (see Table 5.1 ).
Multivariate statistical analysis is a collection of statistical methods to
analyze multivariate data.
Multivariate statistical analysis methods use various statistical distance
metrics (e.g., Euclidean, Minkowski, Manhattan, see Chapter 1) to express
dissimilarity (or similarity) among observations and project them into a new
multidimensional space. The main advantage of Euclidean distance in a multi-
dimensional space is that it is interpreted more easily compared to other
distance metrics. Whichever statistical distance is used, small distances re ﬂect
similarity among observations, while large distances reveal dissimilarity. Values
are stored at a matrix called dissimilarity matrix.
Table 5.1 Multivariate dataset with spatial reference (postcodes polygons).
P12[REDACTED PHONE] 0
n Postcode ID Population Income Unemployment ... Medical Expenses[REDACTED PHONE] 12,[REDACTED PHONE]
(α1,[REDACTED PHONE],[REDACTED PHONE]
(α1,[REDACTED PHONE]% (α1,4) 3,[REDACTED PHONE] (α1,[REDACTED PHONE] 2,[REDACTED PHONE],[REDACTED PHONE]% [REDACTED PHONE]
… ... ... ... ... ... ...
[REDACTED PHONE] α64,2 α64,[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
Dissimilarity matrix is a square and symmetric matrix that stores the pair-
wise dissimilarities among nobservations (data points) calculated through any
statistical distance metric d(5.2). The diagonal cells are de ﬁned as zero (dis-
tance with itself ) while the off-diagonal cells store the pairwise dissimilarities.
Dissimilarity matrix ¼observations[REDACTED PHONE] ... n
10 d1;2ðÞ d1;3ðÞ ...d1;nðÞ
2 d2;1ðÞ 0 d2;3ðÞ ...d2;nðÞ
3 d3;1ðÞ d3;2ðÞ 0 ...d3;nðÞ
... ... ... ... 0 ...
nd n ;1ðÞ dn;2ðÞ dn;3ðÞ ... [REDACTED PHONE][REDACTED PHONE](5.2)
For example, for any given multivariate dataset A,we can calculate the statis-
tical distance of any two observations c,dusing the Euclidean distance norm
extended to a multidimensional space ( 5.3):
dist c ;dðÞ ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ
ac1/C0ad1 ðÞ2þac2/C0ad2 ðÞ2þ...acp/C0adp/C0/C12q
(5.3)
where c,dare observations, and 1,2, ...pare variables.
In practice, for any two observations (lines), we subtract the values among
the same variables (columns) and then square and add them to ﬁnally get the
square root, just as if they were coordinates of a point. For the example given
inTable 5.1 , the statistical distance between postcodes[REDACTED PHONE] and[REDACTED PHONE] is (in
practice, most of the times we standardize data before calculating distance –
seeSection 2.4 ):
dist[REDACTED PHONE] ;[REDACTED PHONE] ðÞ
¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ﬃ[REDACTED PHONE] /C024585 ðÞ2þ25000 /C017250 ðÞ2þ...[REDACTED PHONE] /C0550 ðÞ2q
¼500
,and an indicative dissimilarity matrix would be
Dissimilarity matrix ¼13231[REDACTED PHONE] ... [REDACTED PHONE][REDACTED PHONE] ... [REDACTED PHONE][REDACTED PHONE] ... [REDACTED PHONE][REDACTED PHONE] ... [REDACTED PHONE]
... ... ... ... 0 ...
[REDACTED PHONE][REDACTED PHONE] ... [REDACTED PHONE][REDACTED PHONE]
Although most of the methods to analyze multivariate data are purely statis-
tical, spatial extensions have been proposed to re ﬂect the underlying
geography. For this reason, this chapter presents basic multivariate statistical
methods, from a spatial perspective, for data reduction and data clustering as
/C15 Principal Component Analysis (PCA),
/C15 Factor Analysis (FA)
/C15 Multidimensional Scaling (MDS)
/C15 Cluster Analysis (for classifying observations)[REDACTED PHONE] Multivariate Data Analysis

--- Page[REDACTED PHONE] ---
/C15 Regionalization when clustering is made with spatial constraints
/C15 Similarity analysis
Why Use
Vast amounts of data are collected daily from various sources such as satellite
and environmental sensors, web geo-location services and social media (Gre-
kousis et al. 2013b ). Integrating these data to existing datasets, derived from
national censuses or other depositories, offers a wealth of information, if
analyzed wisely (Grekousis et al. 2019a). Multivariate techniques delve into this
endless pool of data to discover patterns and unexpected trends or behaviors,
and extract hidden knowledge valuable for spatial analysis and spatial
planning.
Multivariate statistical analysis methods presented in this chapter are used to
(a) Eliminate collinearity
(b) Reduce the dimensions of multivariate data (group variables)
(c) Uncover latent variables
(d) Map observations to lower dimensions
(e) Cluster objects to homogenous groups (group observations)
Discussion and Practical Guidelines
Selecting the most appropriate variables for any geographical analysis is not a
trivial problem. Including all available variables would probably lead to serious
multicollinearity issues –i.e., many variables would provide relatively less new
added information. Multicollinearity exists among two or more variables in a
dataset when they are highly correlated (see Chapter 6 ). In addition, a large
number of variables may lead to overrepresentation in some categories. For
example, we might have eight lifestyle variables related to education and how
people spend their free time on outdoor activities. If only two variables refer to
education, this may lead to the creation of a distance matrix that will emphasize
free time differences rather than educational differences. For this reason, a
careful inspection of the variables to be selected is necessary so that variables
are balanced and multicollinearity is kept to low levels. On the other hand,
selecting a few variables, based on our previous experience or intuition, may
lead to information loss, as variables excluded might reveal important hidden
(latent) information.
In this respect, data reduction methods as PCA, FA and MDS are essential in
multivariate data analysis, as they reduce the number of columns in a dataset.
PCA and FA ’s main advantage is that they reveal latent variables uncovering
hidden interactions. Moreover, by removing multicollinearity, components and
factors are uncorrelated to each other. As a result, they can be used as inde-
pendent explanatory variables in a subsequent regression analysis (Wang[REDACTED PHONE] ). MDS ’s main characteristic is that it maps observations to two or three278 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
dimensions, providing a graphical representation of similarity for objects clus-
tered together and dissimilarity for objects lying apart.
Clustering analysis, on the other hand, has the advantage of grouping
observations while retaining the number of variables, which makes a signi ﬁcant
difference from the previous data reduction methods. Clustering analysis is
important in a geographical context, as it reveals various hidden underlying
spatial processes at play. When clustering takes into account spatial con-
straints, regionalization methods are used. Regionalization methods ’main
advantage is that they produce homogenous clusters of spatial features that
are also adjacent. These methods are fundamental in decision making and
spatial planning (Grekousis et al. 2013c). Finally, similarity analysis is bene ﬁcial
when we need to rank spatial features according to how similar or dissimilar
they are.
A typical problem in multivariate datasets is that variables are not always on
the same measurement scale and of the same units. Some variables might refer
to extremely small values, while others refer to large ones. For example, in a
household survey, the variable “Number of children ”gets values usually less
than 10 while the “Annual household income ”variable might be some thou-
sands of dollars. Large values tend to dominate the results, and for this reason,
values should be standardized to z-scores using the mean value of each
variable and its standard deviation (see Section 2.4 ;O’Sullivan & Unwin[REDACTED PHONE]
p. [REDACTED PHONE]). By standardizing values, variables are no longer dependent on the
measurement scale and are comparable to each other (Wang[REDACTED PHONE] p. [REDACTED PHONE]).
Normalization is another method used to rescale data in the same values range
and is widely used before any statistical distance is calculated. See Section 2.4
for more details on differences between normalization and standardization.
Standardization is occasionally preferred over normalization as it better retains
the importance of each variable due to the non-bounding limitation. For
example, in case of outliers, normalized data are squeezed at a small range,
and as such, when dissimilarities (through statistical distances) are calculated,
they contribute less to the ﬁnal values.
Multivariate methods have been mainly originated from classical statistical
analysis, but their usage in geographical analysis is extensive. The reason is that
geographical studies heavily rely on census, socioeconomic or other large
multivariate datasets and consequently deal with either variable reduction or
data clustering. In this respect, these methods are of crucial importance in
order to better analyze data and are necessary in spatial analysis.
5.2 Principal Component Analysis (PCA)
Deﬁnition
Principal component analysis (PCA) is a technique used to summarize multi-
variate data in fewer interpretable variables called principal components[REDACTED PHONE].2 Principal Component Analysis (PCA)

--- Page[REDACTED PHONE] ---
Aprincipal component is a linear combination of the original (or standard-
ized) values of the variables and is calculated by extracting the eigenvectors
and eigenvalues of the variance –covariance matrix or the correlation matrix
(5.4) of matrix A(5.1), (Penn State University[REDACTED PHONE] ).
PC i¼X!
j/C2e!
i¼X1X2X3... Xp ½/C138 /C2ei1
ei2
ei3
...
eip2[REDACTED PHONE]
77775¼ei1X1þ...eipXp(5.4)
fori=1t o p(number of components-variables),
forj=1t o n(number of observations),
where XJ!
is the vector of the j-th observation of the
matrix A :XJ!
¼X1X2X3... Xp ½/C138 , it is equivalent to
XJ!
¼aj1aj2aj3... ajp/C2/C3
(see Eq. 5.1 ).
eι!is the eigenvector of the i/C0thcomponent,ei1
ei2
ei3
...
eip2[REDACTED PHONE][REDACTED PHONE]:
The values ei1,ei2...of each eigenvector are called principal component
coefﬁcients or loadings (Wang[REDACTED PHONE] p. [REDACTED PHONE]).
Why Use
PCA reduces the dimensions (variables) of a multivariate dataset to a set of
independent uncorrelated variables, called the principal components, to make
analysis more comprehensive and interpretation easier, while at the same time
preserving most of the information (variation) existing in the dataset (Cangelosi
& Goriely[REDACTED PHONE] ). The principal components can substitute the original variables
in any subsequent analysis.
Interpretation
Eigenvectors are extracted from the variance –covariance matrix of A(see
Section[REDACTED PHONE] ). We do not get into details on how eigenvectors and eigenvalues
are computed through the variance –covariance matrix, but we focus on their
geometric meaning. PCA projects data from their original dimensions to new
ones so that the variation of the data is better explained. The eigenvectors areused to construct the new axes called principal components, which correspond
to the direction (in the original space) with the largest variance in the data280 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
(Hamilton[REDACTED PHONE] ). Each eigenvector has a corresponding eigenvalue (also
named latent roots) that expresses the variability of each corresponding
principle component (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). A principal component
with a low eigenvalue does not explain a lot of data variation. The eigenvalue
can be also used to draw the standard deviational ellipse as shown in
Figure 5.1D (O’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). The original values can now be
projected to the new dimensions based on the scale factor (eigenvalue) and
the new space de ﬁned by the components (new axes).
The number of principal components is equal to the number of the original
variables in the dataset. Each component explains a certain amount of the
original variation of the variables. Components are ordered according to their
eigenvalues, so the ﬁrst one explains the largest variability (largest eigenvalue)
of the dataset, the second component explains the second largest variability
and so on (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). “Explaining the variability (or
variation) in the data ”stands for the percentage revealing the amount of
information retained of the original dataset after the transformation has been
applied. A 60% variability explained means that by data reduction through
PCA transformation, we kept 60% of the initial information, or we lost 40% of
the original information. The components are structured so that they are
uncorrelated with each other, and this is achieved by the orthogonal trans-
formation applied by PCA in the multid imensional space (for this reason,
principal components are often treated as dependent variables for regression
analysis and are also used in cluster analysis, as they do not exhibit
multicollinearity). In other words, the second component is orthogonal to
theﬁrst one, the third one is orthogonal to the second one and so on. As
the orthogonal arrangment of components is not easily comprehensible in a
multidimensional space, let us describe a more straightforward example in
the two-dimensional space.
Suppose we analyze the average annual “Income ”and the average “House
size”of 10 spatial units (e.g., postcodes). Using a scatter plot in the two-
dimensional space (see Figure 5.1A ), we observe a diagonal trend between
these two variables (see Figure 5.1B ). We can calculate the variance σ2in the
x-direction and the variance σ2in the y-direction as a measure of the values
spread. Still, the horizontal and vertical variance does not accurately explain
the clear diagonal trend. Instead of calculating the variance for the x- and
y-axes, it is better to rotate them so that the x-axis captures the maximum of
the variance of the data points cloud (observations; see Figure 5.1C ). The y-axis
will remain orthogonal to x, capturing another proportion of the variance. The
new axes are the ﬁrst and the second principal components. The eigenvalues
are also used to create the standard deviational ellipse (see Chapter 2 ), which
contains the majority of the data points (see Figure 5.1D ). The center of the
ellipse is the mean center of the data points. The major axis lies on the ﬁrst
component, and the minor axis –which is orthogonal to the major one –lies on281 5.2 Principal Component Analysis (PCA)

--- Page[REDACTED PHONE] ---
Figure 5.1 PCA graphical example[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
the second component. The length of each axis is the square root of the
corresponding eigenvalue. Original data can now be projected to the new
space, and their scores (new coordinates) can by calculated applying formula
(5.4). In this example, the scores of the ﬁrst two principal components are the
coordinates of the original data to the new axis (see Figure 5.1E ).
Each subsequent component is orthogonal to the previous one, thus explain-
ing less variability (Troy[REDACTED PHONE] ). Therefore, only a few of the ﬁrst components
should be kept: those that explain most of the variability. To calculate the
percentage of the explained variability of each component, we use the corres-
ponding eigenvalue. The eigenvalue of each component equals the variance of
the data (how much the data vary) along this component (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE], (Wang[REDACTED PHONE] ). The proportion of the overall data variance
accounted by the i-th component is given in ( 5.5):
ExplVari¼λi
λ1þ...þλp(5.5)
where λiis the eigenvalue of the i-th component and pis the total number of
components.
The larger the eigenvalue, the more signi ﬁcant a component is, as it captures
a lot of the original data variability. There are many different ways to determine
the number of meaningful components to be retained, including the cumula-
tive percentage of the total variance, the SCREE plot, the Kaiser –Guttman test,
the broken stick model, Cattell ’s cross-validation, bootstrapping techniques
and Bartlett ’s test for equality of eigenvalues (Cangelosi & Goriely[REDACTED PHONE] ). For
additional methods, one might refer to the works of Jolliffe ( [REDACTED PHONE] ) and Jackson[REDACTED PHONE] ).
We describe here the ﬁrst three methods:
/C15 Cumulative percentage of the total variance : In this method, the total
variation explained by the ﬁnal components should be larger than a
threshold value (e.g. 70%). Still, for a high-dimensional dataset achieving
a high cumulative value might result in retaining too many components
that are hard to interpret.
/C15 SCREE plot: A SCREE plot is created to depict the fraction of the
variance explained over each additional component. The location at the
graph where additional components do not signi ﬁcantly change the
explained variance is called “elbow point, ”as there is a sharp change in
the slope. Components after the elbow point can be discarded (see also
the elbow method in Section[REDACTED PHONE] and Figure 5.2 ).
/C15 Kaiser –Guttman test : If the principal components are derived from a
covariance matrix, then those eigenvalues larger than the average of all
eigenvalues are retained. In the case where the principal components are
derived from a correlation matrix, the average of eigenvalues is 1; there-
fore, eigenvalues larger than 1 are retained[REDACTED PHONE].2 Principal Component Analysis (PCA)

--- Page[REDACTED PHONE] ---
The selection of the appropriate principal components to be retained depends
on the problem studied and the research questions. Too many components
make interpretation hard, while too few components may be insuf ﬁcient if the
explained variance is small. A low explained variance might be the result of
many highly skewed variables. Moreover, mixing too many conceptually differ-
ent variables may lead to low pairwise correlations. In this case, principal
components can hardly capture a large share of variance. When pairwise
correlations among variables do not exceed 0.3 (or are not less than /C00.3),
PCA may not reach a suf ﬁcient level of explained variance. The reason is that
PCA transforms correlated variables into a new component that summarizes
them. In this sense, a good starting point before applying PCA is to check
pairwise correlations and potentially remove those variables that have moder-
ate to low correlation (for example, remove those with r<[REDACTED PHONE]). However,
setting aside some variables may substantially alter PCA results, and as such, it
depends on the scope of the analysis to ﬁnally decide which variables to retain.
If variables are not measured in the same units or when variables ’values
exhibit large differences in their range, data ( Amatrix) should be standardized.
Standardization is commonplace in social analysis, as social data are character-
ized by large differences on their scales.
Discussion and Practical Guidelines (Work ﬂow and Spatial Data)
Within a geographical context, PCA is deployed in ﬁve steps (see Box 5.1 ):
Step 1. Standardize dataset (if needed). Apply descriptive statistics to check
if the variables are of different scales (or if different units are used). If so,
standardizing data by calculating z-scores is needed (see Section 2.4 ). To
standardize data, we can also use the pairwise correlation matrix which is
scale independent. The covariance matrix is scale dependent and shouldbe used if data are of similar scales.
Figure 5.2 Scree plot of explained variance over the number of components[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
Step 2. Compute: (a) the eigenvectors (loadings of principal components),
(b) the principal component scores and (c) the eigenvalues-variance
(latent roots).
Step 3. Select the number of principal components to retain using an
appropriate method (e.g., by constructing a SCREE plot).
Step 4. Interpret principal components . PCA is a descriptive technique and
is not based on hypothesis testing. As such, PCA interpretation largely
depends on how we describe the corresponding components. We label
each component by disregarding those loadings below a certain thresh-
old. A rule of thumb is to keep those loadings larger than 0.3 or smaller
than /C00.3. We have to underline that principal components are likely to
produce unmeaningful results. Although, produced components are
nicely organized variables sometimes this does not equate to something
in reality –a major drawback of PCA. Scatter plots may also be used for
theﬁrst two principal components.
Step 5. Map scores in geographical space . This step is not included when
PCA is applied for nonspatial data. Mapping does not refer to a typical
scatter plot of the ﬁrst two principal components scores. It refers to the
assignment of scores to spatial units as an extra variable that can be
subsequently analyzed by spatial analysis techniques (e.g., spatial
autocorrelation). Thus, by PCA, we keep the best components that
explain most of the data variation and then map them to further discover
any spatial associations.
The primary criticism of PCA is that it transforms data on orthogonal space.
Still, the real space cannot be accurately transformed in this way, as hetero-
geneity and autocorrelation exist. In addition, the results produce a global
summary of the data and are not presented spatially by the method itself.
In case of vector data, PCA runs on the entire dataset, and geographical
effects are not taken into account in the calculations (Dem šar et al. [REDACTED PHONE] ). The
mapping of scores of vector data in a GIS environment does not make PCA a
spatial analysis method, as it does not take into account spatial concepts
(distance, location, neighborhood) or spatial heterogeneity and autocorrela-
tion in its construction.
In case of raster data, components are calculated based on measurements
referring to each cell. This approach is common in remote sensing when
different raster datasets with the same spatial reference and extent are com-
bined to produce composite indices. For example, we may combine a land use/
land cover image, a raster for soil variables, a raster of temperature, a raster of
CO 2emissions and a raster of socioeconomic and census data. By applying
PCA, we detect the principal components, and we map each one of them as a
new raster ﬁle. In this way, we depict the spatial distribution of each compon-
ent’s scores directly. PCA dealing with raster data is named raster PCA and is
typically applied in a GIS environment. Raster PCA handles better raster spatial
data but does not account for any spatial effects. Raster data PCA is suitable for285 5.2 Principal Component Analysis (PCA)

--- Page[REDACTED PHONE] ---
combining social data (usually in vector format) with environmental data (typic-
ally in raster format) to produce composite indices usually derived from the ﬁrst
few principal components (Dem šar et al. [REDACTED PHONE] ).
To account for spatial heterogeneity and spatial autocorrelation when deal-
ing with spatial data, several approaches have been proposed as the geo-
graphically weighted PCA.
Geographically weighted PCA (GWPCA) (Fotheringham et al. [REDACTED PHONE] , Charl-
ton et al. [REDACTED PHONE] , Harris et al. [REDACTED PHONE] ) calculates a local PCA model for every
single location based on geographically weighted data of a user-de ﬁned
neighborhood. Principal components, eigenvalues and eigenvectors are calcu-
lated for every single spatial unit and thus can be mapped and further spatially
analyzed (Dem šar et al. [REDACTED PHONE] ). GWPCA can be used to produce local compos-
ite indices because each local principal component describes the relationships
of the original variables at the speci ﬁc location. Furthermore, GWPCA can be
used as an interpolation technique to obtain eigenvalues and eigenvectors at
unobserved locations. By using GWPCA, we may additionally estimate scores
at locations where data do not exist by generating spatial surfaces of eigen-
values and eigenvectors (Harris et al. [REDACTED PHONE] ). Finally, GWPCA can be used prior
to geographically weighted regression (see Section 6.5 ) to produce uncorrel-
ated compound variables.
Numerical Example
Let’s see a numerical example that illustrates the ﬁve PCA steps as presented
earlier. Only a snapshot of the ﬁnal results is desribed per step, as the focus lies
on the procedure and the output interpretation. Suppose we have a dataset
(matrix A) describing n= 50 neighborhoods (observations) with p= 5 variables
(columns): Income, Housing Conditions, Crime, Health, Pollution. To perform
PCA, we follow the next steps:
Step 1. Standardize this dataset. The ﬁrst two rows of the standardized
matrix are presented in Table 5.2 .
Step 2. Compute: (a) the eigenvectors (loadings of principal components),
(b) the principal component scores and (c) the eigenvalues-variance
(latents).
Results are
(a) Table 5.3 presents the loadings of the ﬁrst two principal
components.
(b) Principal component scores calculated based on Equation (5.4 ):
PC i¼X!
J/C2e!
ι¼X1X2X3... Xp ½/C138 /C2ei1
ei2
ei3
...
eip2[REDACTED PHONE]
77775¼e
i1X1þ...eipXp286 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
The general formula for the ﬁrst principal component score is
PC1 = [REDACTED PHONE] /C2Income + [REDACTED PHONE] /C2Housing Conditions + [REDACTED PHONE] /C2Crime
+ [REDACTED PHONE] /C2Health + [REDACTED PHONE] /C2Pollution
The values of the variables are deri ved from the standardized matrix ( Table 5.2 ).
The score of the ﬁrst principal component for the ﬁrst observation is
PCi¼/C0 0:[REDACTED PHONE] /C00:[REDACTED PHONE] /C00:[REDACTED PHONE] /C00:[REDACTED PHONE] /C00:[REDACTED PHONE] ½/C138 /C20:[REDACTED PHONE]:[REDACTED PHONE]:[REDACTED PHONE]:[REDACTED PHONE]:[REDACTED PHONE][REDACTED PHONE]
PC1 = [REDACTED PHONE] /C2(/C00[REDACTED PHONE]) + [REDACTED PHONE] /C2(/C00[REDACTED PHONE]) + [REDACTED PHONE] /C2(/C00[REDACTED PHONE])
+ [REDACTED PHONE] /C2(/C00[REDACTED PHONE]) + [REDACTED PHONE] /C2(/C00[REDACTED PHONE]) = /C00[REDACTED PHONE]
The score of the second principal component for the ﬁrst observation is:
PC2 = /C00.42/C2(/C00[REDACTED PHONE]) + ( /C00.35) /C2(/C00[REDACTED PHONE]) + [REDACTED PHONE] /C2(/C00[REDACTED PHONE])
+ [REDACTED PHONE] /C2(/C00[REDACTED PHONE]) + [REDACTED PHONE] /C2(/C00[REDACTED PHONE]) = /C00.256Table 5.2 Standardized matrix.
Variables
X1 X2 X3 X4 X5
Observations
(neighborhoods)Income Housing
ConditionsCrime Health Pollution
1 /C00[REDACTED PHONE] /C00[REDACTED PHONE] /C00[REDACTED PHONE] /C00[REDACTED PHONE] /C00[REDACTED PHONE][REDACTED PHONE] /C00[REDACTED PHONE][REDACTED PHONE] /C00[REDACTED PHONE][REDACTED PHONE]
... ... ... ... ... ...
50 ... ... ... ... ...
Table 5.3 Multivariate dataset. Loadings with bold indicate moderate to strong correlations
that assist in interpreting principal components.
Eigenvectors ( ei) (Loadings)
Variables First principal component Section principal component
Income[REDACTED PHONE] /C00.42
Housing Conditions[REDACTED PHONE] /C00.35
Crime[REDACTED PHONE].59
Health[REDACTED PHONE].18
Pollution[REDACTED PHONE][REDACTED PHONE].2 Principal Component Analysis (PCA)

--- Page[REDACTED PHONE] ---
Likewise, we calculate the scores of all components (no matter if they are
ﬁnally included) for all observations. Scores are stored in a matrix with as
many rows as the observations and as many columns as the variables (in
this example, 50 /C25; see Table 5.4 ). The scores matrix is the representa-
tion of standardized Ain the principal component space.
(c) Eigenvalues-variance (latents).
The eigenvalues, the explained variance, and the cumulative explained
variance are presented in Table 5.5 . Based on Equation (5.5 ), the ﬁrst
principal component explains[REDACTED PHONE]% of the dataset ’s variance, while both
ﬁrst and second principal components explain[REDACTED PHONE]% (cumulative
explained variance of each row is the addition of explained variance of
each row and above; see Table 5.5 ). Note that since we standardized
data, the variance of each variable should equal 1. As a result, the sum of
all eigenvalues (total variation) should be p= 5 (as many as the variables).
Step 3 .Select the number of principal components .
Based on the explained variance (see Table 5.5 ), we create the SCREE plot
to select the number of principal components to ﬁnally keep (see
Figure 5.2 ). From the plot graphical inspection, there is a sharp change
in slope (elbow criterion) in component 3. Still, we do not keep this
component, as it only accounts for an additional 4.6% variance explained
(see Table 5.5 ). It is more rational to retain only the ﬁrst two principal
components, as they capture more than 90% of the total variance. ItTable 5.4 Scores table.
Observations PC1 PC2 PC3 PC4 PC5
1 /C00[REDACTED PHONE] /C00[REDACTED PHONE] ... ... ...
2 ... ... ... ... ...
... ... ... ... ... ...
50 ... ... ... ... ...
Table 5.5 Eigen values and explained variance.
Component: ( i)Eigenvalue ( λi)
(latent)Explained variance:
λi/Sum = λi/5Cumulative
explained variance[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE][REDACTED PHONE]
Sum[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
would not be wrong, though, if we included the third component. It
depends on the loadings and on the scopes of analysis.
Step 4. Interpret principal components.
To label components, we de ﬁne two thresholds. Variables with loadings
larger than 0.3 are considered positively correlated, while variables with
loadings smaller than /C00.3 are considered negatively correlated (see
Table 5.3 ). The ﬁrst principal component is a measure of wealth as it is highly
correlated with both income and the hou sing conditions (the rest variables
have lower loadings and thus are not included in the description of the
component). In other words, neighborhoods with high income are more
likely to exhibit good housing conditions. The positive relation (positive sign)
reveals that an increase in one variable will lead to an increase of the other.
The second principal component is a measure of deprivation. There is a
negative correlation between income and housing conditions and a posi-
tive correlation between crime and pollution. Neighborhoods with low
income and inadequate housing conditions tend to have high crime rates
and more pollution.
The scores of the ﬁrst and second principal components of each observation
(see Table 5.4 ) can be plotted in a scatter plot (for example, for the ﬁrst
observation, the coordinates at the scatter plot are ( /C00[REDACTED PHONE], /C00[REDACTED PHONE]).
Plotting the coordinates assists on further describing the ﬁrst two princi-
pal components.
Step 5. Map scores in geographical space.
Create a choropleth map depicting the scores of the ﬁrst (or second) princi-
pal component in a GIS environment.
Box 5.1 Matlab. As ArcGIS and GeoDa do not offer tools for PCA, we
present a small example through Matlab. Suppose we have a set of socio-
economic variables ( Data ) for the postcodes of the City and we want to
narrow data to a more meaningful dataset. We can easily perform PCA in
Matlab. Go to the Matlab folder of Lab 5 to ﬁnd data and code to complete a
PCA through the steps explained before (standardization, computing: eigen-
vectors, eigenvalues and principal component scores, calculating explained
variance and scree plot and plotting a scatter plot of the ﬁrst and second
principal component). Run PCA.m.
5.3 Factor Analysis (FA)
Deﬁnition
Factor analysis is a data dimension reduction techinique that describes a set of
observed variables using fewer unobserved (latent) variables called factors[REDACTED PHONE].3 Factor Analysis (FA)

--- Page[REDACTED PHONE] ---
Why Use
Factor analysis is used when we need to reduce the existing variables by using
fewer factors that better represent the original ones, thus simplifying the data
structure and the overall analysis (Wang[REDACTED PHONE] p. [REDACTED PHONE]).
Interpretation
The factors attempt to explain the variation of the original data, and they can
be viewed as broad concepts that describe a set of observations. Although FA
is closely related to PCA, it follows a different approach. PCA transforms the
original observed variables, and thus, it can be considered as a mathematical
transformation using linear combinations of the original variables (Dem šar et al.
[REDACTED PHONE] , Wang[REDACTED PHONE] ). On the other hand, factor analysis attempts to capture the
variations of the observed variables on the assumption of latent variables ’
existence, the factors, associated with error terms and for this reason can be
regarded a statistical process. As FA needs much subjective judgment, it is
highly controversial in statistical circles. Factor analysis is not that common in
geographical analysis lately.
5.4 Multidimensional Scaling (MDS)
Deﬁnition
Multidimensional scaling (MDS) is a technique that reduces the dimensionality
of an N-dimensional dataset ( N>2) into two or three dimensions while preserv-
ing at some extent the relationships (similarities or dissimilarities) among the
observations (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]).
Why Use
MDS is useful for visualizing the similarities/dissimilarities of a complex dataset
by mapping them into two or three dimensions.
Interpretation
An MDS algorithm maps objects from an N-dimensional space to a two- or
three-dimensional space con ﬁguration so that between N-dimensional objects,
distances are retained as much as possible. Given pairwise dissimilarities, the
algorithm reconstructs a map that preserves distances. The closer two objects
they lie in the two or three-dimensional space, the closer they lie in the N-
dimensional space as well.
Discussion and Practical Guidelines
Given pairwise dissimilarities, the algorithm reconstructs a map in the two- or
three-dimensional space, called ordination, which preserves as much as pos-
sible the original distances (O ’Sullivan et al. [REDACTED PHONE] p. [REDACTED PHONE]). The closer two
objects lie in the ordination, the closer they lie to the N-dimensional space as290 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
well. The ordination is used to effectively identify interesting relationships
among objects that are not obvious when using other statistical metrics. Locat-
ing objects with common characteristics as sets of points forming clusters in the
ordination is a matter of an eye blink (see Figure 5.3 ). Dissimilarities can also be
located by identifying which objects lie far away from the others. MDS calcu-
lates the new positions of the objects based on a distance or dissimilarity metric
(i.e., Euclidean, Manhattan) through a matrix of pairwise distances (or dissimi-
larities) derived from the original dataset. MDS technique creates then a new
point con ﬁguration whose inter-point distances approximate to the original
dissimilarities.
As distortions from the 3-D plane to the 2-D plan are inevitable (i.e.,
projecting points in earth ’s surface to a map), MDS also infers distortions to
the original N-dimensional data. This distortion is called stress. The method
proceeds iteratively by mapping orig inal data to two or three dimensions so
that the stress function is minimized. Although stress might be large in some
cases, MDS gains much acceptance especially for concept mapping where
ideas or other interesting concepts are spatialized to re ﬂect similarities and
differences.
Let’s see a brief example about seven Greek touristic islands, described by
ﬁve variables (dimensions): the number of tourists visited the island the previ-
ous summer, the average money spent per tourist, the average nights per stay,
the percentage of tourists coming from Italy and the percentage of tourists
coming from the UK (see Table 5.6 ,Box 5.2 ). To identify any similarities and
dissimilarities among the islands pertaining to the tourism industry we conduct
MDS (see Figure 5.3 ).
The stress value is[REDACTED PHONE], re ﬂecting a relatively low distortion. Results show
that Santorini and Mykonos are more similar in comparison to Hydra and
Amorgos. Also, Corfu and Zante have more similarities. Crete is entirely differ-
ent from Amorgos (inspect Table 5.6 in comparison to the ordination in
Figure 5.3 ). By MDS and ordination, we get a quick view of similarities and
dissimilarities among the observations that would not be apparent by just
inspecting Table 5.6 .
Figure 5.3 MDS ordination. Reducing from ﬁve to two dimensions[REDACTED PHONE].4 Multidimensional Scaling (MDS)

--- Page[REDACTED PHONE] ---
Box 5.2 Matlab. You can easily reproduce the preceding representation
and analysis in Matlab. Go to the Matlab folder of Lab 5 to ﬁnd data and
code. Run MDS.m.
5.5 Cluster Analysis
Cluster analysis is a process where objects (observations) of a dataset are
grouped into a number of clusters. Clusters are formed on the basis that
objects within a cluster are as similar as possible (in respect to their attri-
butes/characteristics), while objects belonging in different clusters are as dis-
similar as possible.
The formation of the clusters is based on a distance matrix between all
observations in the dataset (inter-observation distance –seeEq. 5.2 ), the dis-
similarity matrix (see Eq. 5.2 ). Most of the time, before calculating the dissimilar-
ity matrix, the data should be rescaled so that variables do not depend on the
measurement scale and are comparable with each other. Failing to rescale data
leads to assigning disproportionally more importance in variables with signi ﬁ-
cantly larger values with respect to the other ones. Standardization is preferred
to normalization, as it better retains the importance of each variable due to the
non-bounding limitation. For example, in case of outliers, normalized data are
squeezed at a small range, and as such, when dissimilarities (through statistical
distances) are calculated, they contribute less to the ﬁnal values. Still, rescaling is
not always desirable. In case we have data of similar scales, proportions (e.g.,
percentages) or we want to assign weights to the variables with larger values, we
might not consider normalizing, adjusting or standardizing. The decision on
which rescaling type to apply depends on the type of the clusters we wish toshape and the dataset available. (see Section 2.4 ).Table 5.6 Five-dimensional dataset ( ﬁve variables) related to the tourist industry of the Greek islands.
IDIsland
nameTourists (in
hundred
thousand)Money
spent (in
thousand
Euros)Average
nights
per stayPercentage
of tourists
from ItalyPercentage
of tourists
from the UK
1 Mykonos[REDACTED PHONE][REDACTED PHONE]
2 Crete[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] Santorini[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] Corfu[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] Zante[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] Hydra[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] Amorgos[REDACTED PHONE][REDACTED PHONE][REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
Cluster analysis removes observations from a multivariate matrix A(5.1)i n
contrast to the previous methods (PCA, FA, MDS) that remove variables. In this
section, two nonspatial major clustering techniques are presented, namely the
hierarchical clustering and the partitioning clustering ( k-means clustering).
Section 5.6 presents spatial clustering methods.
[REDACTED PHONE] Hierarchical Clustering
Deﬁnition
Hierarchical clustering is an unsupervised method of grouping data by build-
ing a nested hierarchy of clusters (O ’Sullivan & Unwin[REDACTED PHONE] p. [REDACTED PHONE]). It is based
on the creation of a tree-based representation of the data, which is called a
dendrogram. There are two approaches to perform hierarchical clustering,
namely agglomerative (bottom-up) and divisive (top-down).
Methods
Agglomerative hierarchical clustering starts from the bottom, assigning each
object (also called leaf nodes) to a single separate cluster (see Figure 5.4 ). At
this stage, each cluster contains only one object (member). In the second cycle,
pairs of objects that are more similar (see similarity/dissimilarity measures later
in this section) are grouped, creating clusters with two objects each. In the third
cycle, each cluster (formed in the previous cycle) is joined with the one that is
more similar, and a new set of clusters is created using a linkage method
Figure 5.4 Dendrogram components for agglomerative hierarchical clustering algorithm[REDACTED PHONE].5 Cluster Analysis

--- Page[REDACTED PHONE] ---
(cluster of clusters). The procedure goes on until all clusters are clustered in one
at the top of the tree.
Divisive hierarchical clustering operates in the opposite direction than the
agglomerative method does. It works in a top-bottom manner, considering in
itsﬁrst cycle all objects belonging at a single cluster. At each following cycle,
the most heterogeneous cluster is split into two clusters, until each object
belongs to its own cluster.
Dendrogram
Whichever approach used, results are presented in a treelike diagram called
dendrogram, depicting the hierarchical relationships between clusters (see
Figure 5.4 ). The leaves of the tree, represent the observations ( ﬁrst-level: each
object belongs to its own cluster). Going from bottom up, each two clusters
create a new cluster in a higher level. Clusters that are merged create branches.
The height of each branch (between the two original clusters and the merged
new one) represents the distance between the two original clusters. The height
is also called cophenetic distance. The taller the branch, the larger the distance
of the merged clusters. The horizontal line that connects the two branches is
called link. We only inspect the height of a dendrogram and not the horizontal
distance which is only used for arranging the clusters in the horizontal direction.
Distance Metrics
The agglomerative hierarchical algorithm is based on the creation of (a) a
dissimilarity matrix created through a measure of distance for calculating pair-
wise distances and (b) a linkage method for calculating inter-cluster distances.
Dissimilarity Matrix (Pairwise Distance)
Dissimilarity matrix is a distance square symmetric matrix with its elements corres-
ponding to the pairwise distances (also called inter-observations distance)
between the observations. Any distance metric can be used such as (see Chapter 1 )
/C15 Euclidean distance
/C15 Manhattan distance
/C15 Minkowski distance
/C15 Pearson correlation distance
/C15 Kendal correlation distance (for ranked based correlation analysis)
Euclidean and Pearson correlation distances are quite common methods in socio-
economic analysis. When the Euclidean distance is applied, clustering is based on
whether objects have similar values are not. On the other hand, Pearson correl-
ation distance considers two observations similar if they are highly correlated.
A high correlation between objects does not necessarily mean that their Euclidean
is distance is small as well. There are cases where observations are highly correl-
ated but lie far apart regarding their Euclidean distance. Pearson correlation
distance is particularly helpful when we are interested in not clustering magnitude294 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
but relationships. For instance, in a consumer segmentation analysis, we may want
to target subgroups that have the same attitude/preferences (e.g., buy similar
things) but have different socioeconomic pro ﬁles. In this case, we are interested in
ﬁnding highly correlated variables, and for this reason, correlation-based dis-
tances are more suitable. Keep in mind that in a complex analysis, we should test
our dataset with more than one distance metrics to get a better insight.
Linkage Methods (Inter-Cluster Distance)
Apart from calculating the pairwise distances, we calculate the distances among
clusters (inter-cluster distances). The inter-cluster distance is used to merge
similar clusters in each subsequent step of the algorithm. In this case, clusters
have more than one member and comparison is not straightforward as in the
case of calculating pairwise distances among single observations. Linkage
methods to calculate the inter-cluster distance between two clusters include:
/C15 Single linkage: The distance between two clusters equals the min-
imum distance of an observation in one cluster with another observa-
tion in the other cluster (nearest n eighbors between clusters). It
produces unbalanced clusters and is not that widely used (Wang[REDACTED PHONE] p. [REDACTED PHONE]). Single linkage is approp riate when the two clusters are
well separated.
/C15 Average linkage: The distance between two clusters is the average of all
pairwise distances (all pairs of observations) in the two clusters. The aver-
age linkage is based more on a measure of central location.
/C15 Centroid linkage: The distance between two clusters is the Euclidean
distance between the centroids (means) of the two clusters. Like average
linkage, it provides results based on central location. This method is
appropriate when outliers exist (Wang[REDACTED PHONE] p. [REDACTED PHONE]).
/C15 Complete linkage: The distance between two clusters equals the most
distant observations between these clusters (furthest neighbor). This
method forms clusters with similar diameters ensuring that all observa-
tions inside a cluster lie within the maximum distance. It is appropriate for
producing compact clusters.
/C15 Ward ’s linkage: The distance between two clusters is the sum of
squared deviations from each point to the centroid of each cluster.
This method attempts to minimize the within-cluster variance and to pro-
duce nearly spherically shape clusters with a similar number of observations.
Standardization of data is also necessary in cases where the scale or the units
that measurements are made for each variable are different and should be
applied before any distance matrix calculations.
Choosing the Number of Clusters
For many problems, the number of clusters to partition a dataset is prede ﬁned.
For instance, in a geomarketing analysis, we may want to segment295 5.5 Cluster Analysis

--- Page[REDACTED PHONE] ---
georeferenced data (i.e., socioeconomic variables referred to postcodes) into
two clusters namely, high spenders and low spenders . However, in most real
case studies, it is hard to determine in advance the optimal number of clusters
that best partitions a dataset. In a wider aspect, each clustering method should
ﬁnally produce clusters that are (Grif ﬁth et al[REDACTED PHONE] )
/C15 Similar at their size (containing relatively similar number of objects)
/C15 Not overlapping in shape (distinct boundaries among clusters)
/C15 Internally homogeneous (observations values are close together inside
each cluster)
More partitions than necessary lead to clusters overlapping with small between
clusters differences (meaning that some clusters might be quite similar). Fewer
clusters than necessary lead to nonhomogenous clusters having large within
cluster dissimilarities.
For the hierarchical clustering, we may select the optimal number of clusters
by inspecting the dendrogram ’s overall appearance (shape) and verifying its
Figure 5.5 Inconsistency coef ﬁcient and links[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
consistency. By verifying consistency, we compare the height of each link with
the height of the links in lower levels. When heights are similar, clusters merged
are not that distinct and create a uniform new cluster, as the distances among
points before and after merging are similar. In other words, these clusters
are consistent. If heights between links are large, then clusters merged are
not consistent, as objects of the merging clusters lie far apart from each other.
In such case, it is better not to merge these two clusters, as the new one results
in objects with high dissimilarity. This is the cutoff point to consider that
the optimal number of clusters lies and keep the clusters right before it (by
trimming the dendrogram). A method to locate this point is by using the
inconsistency coef ﬁcient (Jain & Dubes[REDACTED PHONE] ). Inconsistency coef ﬁcient com-
pares the height of each link with the adjacent links that lie below at a certain
depth. Depth is usually one or two levels below in the hierarchy tree (see
Figure 5.5 ). A high inconsistency coef ﬁcient reveals that the merging clusters
are not homogenous, while a low inconsistency coef ﬁcient reveals that clusters
can be merged.
By inspecting the inconsistency coef ﬁcient, we can select a cutoff value that
splits the dendrogram into two parts and keep those clusters that lie below the
cutoff level ( Figure 5.6 ).
Verify Cluster Dissimilarity
Apart from de ﬁning the optimal number of clusters, we should also verify the
quality of partitions regarding their similarity/dissimilarity. A method to verify
cluster similarity is the cophenetic correlation coef ﬁcient, which computes the
linear correlation coef ﬁcient between the cophenetic distances (height of link
that two objects are ﬁrst merged) and the original distances among objects
(Sokal & Rohlf[REDACTED PHONE] ). A strong cophenetic correlation reveals that the dendro-
gram represents the dissimilarities among observations accurately, while a low
cophenetic correlation reveals an invalid clustering. The cophenetic correlation
coefﬁcient is very useful when we compare clustering results using different
distance measures or different linkage methods. For each different distance (or
linkage method), we can calculate the cophenetic correlation coef ﬁcient and
keep the distance metric (or linkage method) that yields the higher one. We
have to emphasize that clustering results should be meaningful according to
our research scopes. For this reason, sometimes we might apply a distance
metric that produces slightly less cophenetic correlation coef ﬁcient than
another distance metric, still leading to an easier (or more rational) clusters
interpretation (see Box 5.3 ).
Box 5.3 Matlab. You can easily reproduce the preceding representation
and analysis in Matlab by using the ﬁrst two numeric columns of Table 5.6.
Go to the Matlab folder of Lab 5 to ﬁnd data and code. Run HC.m[REDACTED PHONE].5 Cluster Analysis

--- Page[REDACTED PHONE] ---
[REDACTED PHONE] k-Means Algorithm (Partitional Clustering)
Deﬁnition
k-means is a clustering algorithm that partitions a dataset of nobser-
vations ( x1,x2,...,xn) into kclusters C={c1,c2,...,ck} by minimizing the
within-cluster sum of squares (MacQueen[REDACTED PHONE] , Pena et al. [REDACTED PHONE] ; see
Figure 5.7 ). The goal is to identify the cluster centers μi,i=1 ...kthat minimize
the function ( 5.6):
argminXki¼1X
x2cix/C0μi kk2(5.6)
Figure 5.6 Cutoff inconsistency value of 0.8. By setting this cutoff value, we keep clusters
that do not exceed this inconsistency. Two clusters are ﬁnally created: Cluster 1 consisting
of objects [1,3,6,7] and Cluster 2 consisting of objects [2,5,4]. If these objects refer to
spatial entities (e.g., postcodes) we can easily map the two clusters using two distinctive
colors by assigning for example 1 (in a new column labeled “CLUSTER ”in the attribute
table) for objects belonging to cluster A, and 0 for those belonging to cluster B[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
where
ciis the set of observations that belong to cluster i
μiis the mean of observations in c i, and
k/C20n
The k-means is an unsupervised machine-learning algorithm, as it does not
require training (set of pre-classi ﬁed features that offer some initial knowledge
of the clustering schema). Computing a k-means algorithm involves the
following steps (Arthur & Vassilvitskii[REDACTED PHONE] ):
1) The initialization step . The selection/calculation of the initial cluster
centers μi(also called seeds) should be kin total.
μi=some value ,i=1...k (5.7) (5.7)
2) The assignment step . Each object is assigned to the closest (in the data
space) cluster center.
3) The calculation of the new cluster center . For each cluster created, the
mean center is calculated.
μi¼1
cjjX
j2cixj (5.8)
where | c| is the total number of objects in cluster c.
The new cluster centers may not be existing observations.
4) The reassignment step . The algorithm returns to step 2, and each object
is reassigned to the closest center. Steps 2 and 3 are repeated untilreassignments at step 4 are stabilized (when the assignments do not
Figure 5.7 (A) Observations before clustering. (B) Observations clustered into two groups,
namely K and L. The k-means algorithm attempts to form clusters in which (a) objects
within the cluster are as close as possible (close to the cluster center) and (b) objects in
the cluster are as far as possible from objects of other clusters[REDACTED PHONE].5 Cluster Analysis

--- Page[REDACTED PHONE] ---
change from one iteration to the other) or a maximum number of iter-
ations is met.
In practice, to ﬁnd the optimal clusters, we should try all possible combinations.
As this is infeasible, especially when the number of variables and the observa-
tions is large, we turn to heuristic algorithms. The k-means heuristic algorithm
does not guarantee reaching a global minimum, but it yields a near-global
minimum. The global minimum is the real minimum value of the dataset.
A near-global minimum means that the algorithm reaches a solution that we
hope it is close enough to the global minimum. We anticipate that a robust
algorithm would approximate the global optimal solution.
Why Use
k-Means algorithm is used to group a set of observations to clusters with similar
characteristics (homogeneous).
Interpretation
Objects of the same cluster are more similar compared to objects in other
clusters. When referring to spatial data, clusters can be visualized by using
typical GIS maps. Polygons belonging to the same cluster are rendered with
the same color.
Discussion and Practical Guidelines
Three main topics should be de ﬁned prior to the k-means algorithm being run:
(a) the initialization method, (b) the number of clusters kthat the dataset will be
partitioned into and (c) the variables to include.
/C15 Select the initialization method: There are various initialization
methods. The most commonly used are the Random Partition and the
Forgy methods (Pena et al. [REDACTED PHONE] ). The Random Partition method assigns
a cluster randomly to each object and then calculates the cluster centers.
The Forgy method chooses kobjects from the dataset as the initial
cluster centers (also called seeds) and then assigns each of the rest
objects to the closest seed. Another method (algorithm) called k-means
++ selects an object randomly as the ﬁrst cluster center (Arthur & Vassil-
vitskii[REDACTED PHONE] ). Each subsequent cluster center is selected using a
probability that is proportional to the squared distance of each observa-
tion to the nearest cluster center. Those seeds lying further away in the
data space from the previously selected seeds are favored.
/C15 Select the number of clusters k:To run the k-means algorithm, we
should ﬁrst set the number of clusters k. Most of the time, the number
of clusters that a dataset should be partitioned is not known in advance,
and a circularity emerges. We need to know k, to partition the dataset
into k-clusters, but we also need to identify which is the appropriate300 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
number of clusters that partitions the dataset in an optimal way. The
choice of the appropriate number of clusters is not trivial and strongly
depends on the problem at hand and the data available. Various methods
exist to de ﬁne the number of clusters that a dataset should be partitioned
to produce well-separated and homogenous clusters. We can rely on
previous knowledge or speci ﬁc requirements for the number of clusters
to be used. We may also conduct several segmentations with incremental
numbers of clusters and keep as the more appropriate k, the one that
yields the most meaningful results in respect to our analysis.
Another approach is using the elbow method. The elbow method plots
the percentage of variance explained (ratio of the between-cluster vari-
ance to the total variance) as a function of increasing number of clusters
(this is similar to PCA and the use of a scree plot; see Figure 5.2 ). The
point at the graph that adding an extra cluster does not explain much
more of the data can be considered as the appropriate number of k(see
Figure 5.8 ). Put simply, the ﬁrst cluster explains a lot of the variance, and
the second cluster adds extra information explaining additional variance,
but at some point, an additional cluster will offer less new added variance
explained, and the graph will signi ﬁcantly change its slope. This point is
called the “elbow ”(due to the sharp slope change). From this point
onward, additional clusters do not signi ﬁcantly improve the variance
explained. We select as the optimal kthe one that corresponds to the
elbow point.
The Cali ński–Harabasz pseudo F-statistic is another method to select
the appropriate number of clusters. It is a measure of separation between
the clusters, re ﬂecting within-cluster similarity and between-cluster
Figure 5.8 Elbow criterion. The location where the graph has a sharp change in its slope is
called the elbow (here marked with a circle), and this point indicates the appropriate
number of clusters k(k= [REDACTED PHONE].5 Cluster Analysis

--- Page[REDACTED PHONE] ---
difference (Cali ński & Harabasz[REDACTED PHONE] ). The most approprate kis the one
with the highest pseudo F-statistic value (see Figure 5.9 ; ESRI[REDACTED PHONE] ).
Akaike Information Criterion (AIC; Akaike[REDACTED PHONE] ) and Bayesian informa-
tion criterion (BIC) (Schwarz[REDACTED PHONE] ) can also be used to determine the
optimal k. In both cases, the lowest value re ﬂects the optimal clustering
(see Box 5.4 ).
Box 5.4 The preceding criteria belong to a wider category of measures
named validation measures/criteria (for more details, see Halkidi et al.
[REDACTED PHONE] ). The purpose of calculating validation measures is twofold. First,
validation measures evaluate the results of a clustering algorithm (shape/
correctness of clusters) and, second, assist in assessing the optimal number
of clusters (Grekousis & Hatzichristos[REDACTED PHONE] ). This process is also called
cluster validity. Validation measures can be also used to compare results
across different clustering algorithms. Two measurement criteria used to
determine the optimal number of groups and the clustering scheme in
general are the compactness and the separation of the clusters. Objects in
Figure 5.9 Pseudo F-statistic criterion to select the number of clusters. The highest
pseudo F-statistic value signi ﬁes large separation among clusters and indicates an
appropriate number of clusters. In this example, the pseudo F-statistic is highest
fork=2 .[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
Box 5.4 (cont. )
each cluster should be compact (as close to each other as possible). Vari-
ance is a standard measure of compactness. Moreover, clusters should lie
away as far as possible and be well separated. In this respect, a reliable
validation measure should consider both the compactness and the separ-
ation of a cluster.
/C15 Select variables: Out of a large number of variables that a database
contains only a small fraction is necessary to de ﬁne the cluster structure
(Brusco & Cradit[REDACTED PHONE] ). The choice of which variables to retain and which
“masking ”variables to eliminate is not easy. A masking variable is a variable
that does not de ﬁne a true cluster structure and may obscure the clustering
analysis (Brusco & Cradit[REDACTED PHONE] ). There are two broad approaches to trace
masking variables, namely variable weighting and variable selection (Gna-
nadesikan et al. [REDACTED PHONE] ). Variable weighting method assigns a weight to each
cluster by minimizing a measure of stress. Variable selection method
assigns a weight of 1 to those selected and zero to the masking variables.
The simplest way to select a variable is to rely on previous knowledge or on
conceptual relevance with the scopes of the study (Grekousis & Thomas[REDACTED PHONE] ). For example, in geodemographical clustering, education would be
more rational in comparison to person ’s height. A simple approach is to
start with a relatively small number of variables, conceptually related to the
problem, and to add new ones successively. If the validation measures
and the clustering scheme improves, then we can keep the newly added
variables; otherwise, we can drop them. Variables that exhibit
multicollinearity may also be removed.
An additional method is to use the R-squared statistic ( R2), which is
calculated for each variable ( 5.9):
R2¼TSS/C0ESS
TSS(5.9)
where
TSS (total sum of squares) is the total sum of squared deviations from
the global mean value of a variable.
ESS (explained sum of squares) is the sum of squared deviations from
the mean value of the group it belongs.
R2reﬂects how much of the original variation of the variable (for the
entire dataset) has been retained after clustering (ESRI[REDACTED PHONE] ). The higher
theR2
,the better a variable divides the original dataset into meaningful
clusters. On the other hand, a variable with low R2would mean low
variance retained in the groups[REDACTED PHONE].5 Cluster Analysis

--- Page[REDACTED PHONE] ---
Finally, variables should be rescaled before calculating the dissimilarity
matrix so they do not depend on the measurement scale and are com-
parable with each other. Failing to rescale data leads to assigning dispro-
portionally more importance in variables with signi ﬁcantly larger values
with respect to the other ones. Standardization is preferred to normaliza-
tion, as it better retains the importance of each variable due to the non-
bounding limitation (see Section 2.4 ).
Another aspect we should consider prior to any clustering is the outlier detec-
tion. Outliers ’existence strongly affects clustering results. To trace outliers, the
methods presented in Chapters 2 and 3can be applied. From the clustering
point of view, outliers can be treated as clusters of a small number of objects
that lie far away from the rest of the clusters (Jiang et al. [REDACTED PHONE] ). The existence of
outliers occasionally unveils interesting patterns that should be further
investigated.
Finally, the k-means algorithm does not imply any spatial constraints, and the
algorithm yields clusters not necessarily contiguous in space. For spatially con-
tiguous clusters, we apply regionalization methods explained in the next section.
5.6 Regionalization
Deﬁnition
Regionalization consists of a set of methods that cluster multivariate spatial
data based on spatial constraints. It is a procedure of grouping a large number
of spatial objects into a desirable smaller number of homogeneous clusters that
also occupy contiguous regions in space (Assunção et al. [REDACTED PHONE] ). The spatial
clusters are also named regions. The entire procedure can also be named as
zone design (Openshaw[REDACTED PHONE] ).
Why Use
Regionalization methods are applied to cluster spatial features into groups with
similar characteristics that are also spatially contiguous. From the spatial
planning and policy perspective, regionalization is an important process where
neighborhoods, census tracts, postcodes, districts or counties are grouped to
form wider homogenous regions that policies related to social, educational,
health, environmental or ﬁnancial issues are applied.
Interpretation
Regionalization methods produce clusters in which (a) features within clusters
are as similar as possible, (b) between-cluster difference is as large as possible
and (c) clusters are composed of contiguous spatial entities[REDACTED PHONE] Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
Discussion and Practical Guidelines
A wide variety of geographical applications exist, especially in planning and
policy analysis, where we need to create clusters of contiguous spatial entities
that form wider homogenous, compact and cohesive regions (Alvanides &
Openshaw[REDACTED PHONE] , Stillwell et al. [REDACTED PHONE] , Photis & Grekousis[REDACTED PHONE] ). Examples
include redesigning the provinces of a state or creating zones that national
funds would be allocated to equally promote sustainable development. We
might also redesign the neighborhoods of a city based on their demographical
characteristics for designing new school districts. In marketing analysis, we may
wish to create homogenous regions of economic activities to promote
targeted policies or conduct market segmentation and identify regions that
speci ﬁc products have better penetration.
Furthermore, the prede ﬁned administrative boundaries (that data are
aggregated) may not re ﬂect well the scopes of a study. If the original data
are available at the smallest aggregation level, then we can design new zones
reﬂecting on the speci ﬁc needs of the analysis. For instance, if a supermarket
chain retains a large customer database including address, demographical
variables and products preference, then new zones can be designed for
tailored analysis, thus avoiding the ecological fallacy and the modi ﬁable area
unit problem (see Section 1.3 ) that may arise if data are aggregated to a
higher-level prede ﬁned zone (i.e., postcode). In particular, zone designing in
various geographical scales also allows for testing the modi ﬁable areal unit
problem.
There is another large class of problems for which regionalization methods
are useful. When we study rare events (e.g., crime events, such as homicides, or
health issues, such as AIDS), we might encounter the problem of relatively few
events existing in our case study area. This is also called the small population
(numbers) problem (Wang[REDACTED PHONE] p. [REDACTED PHONE]). In this type of problem, aggregating
events to prede ﬁned zones (e.g., postcodes or census tracts) would most likely
lead to a large majority of the zones with no events contained and with a small
number of events (maybe one or two on average) on the remaining ones.
Applying typical spatial statistics would yield unreliable estimates. For
example, spatial autocorrelation, or hot spot analysis, cannot be performed
when most zones contain zero events. With regionalization, though, as similar
areas are merged to new regions, spatial autocorrelation is less of a concern in
the newly de ﬁned zones (Wang[REDACTED PHONE] ).
Additionally, ordinary least squares (OLS) regression is not suitable for this
type of problem, as two basic assumptions of OLS are violated, namely the
homogeneity of error variance (because errors of prediction are larger in
polygons with fewer events) and the normal error distribution (see also Chap-
ter 6 ; Wang[REDACTED PHONE] p. [REDACTED PHONE]). Although there are many different approaches to
deal with the small population problem (e.g., use counts instead of rates,
Poisson regression instead of OLS, ﬂoating catchment area, kernel density305 5.6 Regionalization

--- Page[REDACTED PHONE] ---
estimation, locally weighted average and adaptive spatial ﬁltering), regional-
ization methods are considered very ef ﬁcient (Wang[REDACTED PHONE] p. [REDACTED PHONE]).
There are many approaches to deal with regionalization (Assunção et al.
[REDACTED PHONE] , Duque et al. [REDACTED PHONE] ). The SKATER method and the REDCAP method are
presented below.
[REDACTED PHONE] SKATER Method
Deﬁnition
SKATER (Spatial “K”luster Analysis by Tree Edge Removal) is an algorithm
using a connectivity graph, built by a minimum spanning tree, to create
spatially constrained clusters based on a set of variables for a prede ﬁned
number of clusters (Assunção et al. [REDACTED PHONE] ).
Why Use
The SKATER algorithm is used to perform spatially constrained multivariate
clustering.
Interpretation
The spatial features belonging to the same cluster are both contiguous and
homogenous.
Discussion and Practical Guidelines
There are various spatial constraints used –like contiguity edges, contiguity
edges-corners, k-nearest neighbors, Delaunay triangulation and prede ﬁned
spatial weights (see Sections 1.6 and 1.8). Once a spatial constraint is set, a
proximity matrix is created. The minimum spanning tree algorithm is then
applied (using the proximity matrix) to create a connectivity graph and a
minimum spanning tree that represents (a) the relationships among neighbor-
ing features and (b) the features similarity. Each feature is represented as a
node in the tree, and it is connected to other features through branches called
edges. Each edge gets a weight that it is proportional to the similarity between
the features it connects (ESRI[REDACTED PHONE] ).
The SKATER algorithm prunes the graph to get contiguous clusters (ESRI[REDACTED PHONE] ). The algorithm starts by cutting the tree into two parts that form two
well-separated clusters (minimizes dissimilarity in the new clusters). Then, it
divides each part separately creating new clusters up to the point that the
total number of clusters set initially is reached. Each division is made so that
the separation between clusters and the similarity within clusters is
maximized.
The size of clusters can be set by either a count (i.e., what is the minimum or
maximum number of features that a cluster should have) or the sum of a
variable (i.e., the total human population). For example, in case of districts,we can create larger administrative zones where each one should have at least306 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
10 districts and a population ranging from[REDACTED PHONE],[REDACTED PHONE] to 2,[REDACTED PHONE],[REDACTED PHONE] people. There
is always the chance that such constraints cannot be met for all clusters because
of the way that the minimum spanning tree has been constructed or because
the maximum and minimum constraints are very close to each other. In this
case, the clusters that do not ful ﬁll the criteria should be mentioned.
As SKATER algorithm is a heuristic algorithm (algorithms used to ﬁnd solu-
tions when traditional methods are too slow or when ﬁnding the optimal
solution is infeasible), it cannot guarantee an optimal solution. This practically
means that for different algorithm runs, different solutions are likely to occur,
and subsequently, a spatial feature might belong to different regions in each
run (a “run”involves the entire process of the algorithm with all iterations
completed and should not be confused with an iteration of the algorithm). To
account for this problem, we calculate the probability of cluster membership
for each feature using permutations of random spanning trees. By de ﬁning a
number of permutations, we de ﬁne the number of random spanning trees to
be created. SKATER runs for each different spanning tree and the frequency
that each spatial object is assigned to each cluster is recorded (e.g., 99 times
out of[REDACTED PHONE], an object is clustered in cluster A). A high membership probability
would typically mean that a feature is highly likely to belong to the speci ﬁc
cluster ﬁnally assigned by the SKATER algorithm (or else a feature is assigned
to the same group in most of the permutations). A low probability would
typically indicate that the speci ﬁc feature is switching into different groups in
each permutation and the ﬁnal assignment is not reliable. The number of
objects not well assigned should be kept to a minimum.
In general, three basic settings have to be de ﬁned to run SKATER, namely
the spatial constraints method, the optimal number of clusters and the vari-
ables to include (ESRI[REDACTED PHONE] ). Here are some guidelines:
/C15 The polygon contiguity options (contiguity edges, contiguity edges
corners) are not appropriate in the case of island polygons (noncontig-
uous polygons). In such case, k-nearest neighbors and Delaunay
triangulation are preferred.
/C15 Trimmed Delaunay triangulation can also be used to ensure that neigh-
bors lie inside a convex hull. Features outside the convex hull are not
assigned to as neighbors. This method is suitable in the case of spatial
outliers.
/C15 A weight matrix can also be used to include user-de ﬁned weights
that may also re ﬂect time constraints. Still, the algorithm does not
take into account the weights, as it needs a binary de ﬁnition of contin-
gency. If weights are 0 or 1, then the algorithm performs at the usual
fashion. In cases where inverse distance is used without any cutoff point,
all features will get some weight and treated as neighbors. As such, when
using weights, there should be either a cutoff point or a binary
representation[REDACTED PHONE].6 Regionalization

--- Page[REDACTED PHONE] ---
/C15 For additional temporal analysis, we can add variables containing time
such as night, day or the day of the week. As such, the algorithm will be
forced to include temporal distances.
/C15 Similarly, we can add a spatial variable such as distance from the center of
a town, distance from major roads, slope or land cover/land use type
(Grekousis et al. 2015b ). Including such variables will probably reinforce
the spatial clustering process.
/C15 The pseudo F-statistic can also be used as a way to assess the optimal
number of clusters (as in the k-means algorithm). It can also be used to
identify the most effective spatial constraint method as long as the
variables analyzed are the same in each trial.
/C15 Choosing the most appropriate variables to use is similar to k-means algo-
rithm (see Section[REDACTED PHONE] ). Variables should be related to the problem in
question, exhibiting also a high R2,w h i c hr e ﬂects how much of the original
variation of the variable (for the entire dataset) was retained after the
clustering process. It is better to be gin with a few variables and progres-
sively increase them one by one to better understand how each variable
contributes to the data separation. M oreover, variables selected should be
standardized (see Section 2.4 ), as variables with large variance tend to have
substantial in ﬂuence on the clusters compared to variables with small vari-
ance (most of the times this is automatically done by the software used).
/C15 When calculating the variables values in the new regions, we have to be
cautious about how we treat percentages, average values and index
values. In case we have the absolute value of a variable (e.g., human
population), the new value of the region is just the sum of the variable ’s
values within the aggregated polygons. In case of percentages though,
we cannot just sum them or calculate an average. We should ﬁrst trans-
form percentages to their absolute values and then calculate the aggre-
gated ones, re ﬂecting the region. Likewise, for index or an average value
(e.g., car per capita), we cannot sum the values. We should calculate a
weighted average /C22Xas (Wang[REDACTED PHONE] p. [REDACTED PHONE]) ( [REDACTED PHONE]):
/C22X¼Pn
i¼1wiXiPn
i¼1wi[REDACTED PHONE])
where wiis the weight (for example, the total population) in the i-th polygon,
Xi, is the variable value (e.g., number of cars per capita) in the i-th polygon and
nis the number of polygons that have been merged to create a new region.
Theﬁnal weighted average /C22Xreﬂects the value of the variable (car per capita)
of the new region. This formula is similar with the one used for the weighted
mean center presented in Section 3.1 . Different weights can be used according
to the problem in question.
/C15 As the method calculates the probability of cluster membership, we
should also set the number of permutations that de ﬁnes the number of308 Multivariate Data in Geography

--- Page[REDACTED PHONE] ---
random spanning trees to be created. As this analysis takes a consider-
able amount of time, it is advised to ﬁrst de ﬁne the optimal number of
clusters and then perform permutations.
/C15 Graphs and plots can be used to explore results further.
[REDACTED PHONE] REDCAP Method
Deﬁnition
REDCAP (Regionalization with dynamically constrained agglomerative cluster-
ing and partitioning) is a regionalization set of methods (Guo et al. [REDACTED PHONE] ).
Why Use
REDCAP is used to perform spatially constrained agglomerative clustering and
partitioning.
Interpretation
The features belonging to the same cluster are both contiguous and
homogenous.
Discussion and Practical Guidelines
REDCAP creates homogeneous regions by aggregating contiguous areas
based on attributes similarity. REDCAP consists of two levels (Wang[REDACTED PHONE]
p. [REDACTED PHONE]). In the ﬁrst level, hierarchical clustering is performed based on spatial
constraints following a bottom-up approach. In the second level, the tree
constructed in the ﬁrst level (as a result of the hierarchical clustering) is
partitioned following a top-down approach. The Rook ’s contiguity method
(share edge) is used to de ﬁne if any two features are contiguous or not. Like
SKATER, REDCAP can also adopt attribute constraints as minimum or max-
imum regional population, or the minimum number of events contained in each
region (see Box 5.5 ). Similar guidelines referred for SKATER can be used to
guide the process.
Box 5.5 REDCAP toolkit can be downloaded from www.spatialdatamining
.org/software/redcap . ArcGIS offers a toolbox for district design ( www
.esri.com/software/arcgis/ex tensions/districting/download ). This toolbox
does not create regions automatically as SKATER and REDCAP, but it offers
a rich variety of tools to experiment with different scenarios and plans that
can be investigated based on available socioeconomic data. It is a spatial
planning toolkit that can be used to evaluate the results of other automated
methods and cross-compare with human-made scenarios. It also offers a
sense of control by integrating human knowledge of planners and other
stakeholders in the decision process[REDACTED PHONE].6 Regionalization

--- Page[REDACTED PHONE] ---
5.7 Density-Based Clustering: DBSCAN, HDBSCAN, OPTICS
Deﬁnition
Density-based clustering algorithms regard clusters when points concentrate
in a geographical region (high density), while they label as noise (low density)
those points with no neighbors at a close distance (Halkidi et al. [REDACTED PHONE] ).
Why Use
Density-based clustering algorithms are used to perform spatial clustering of
point features and better handle spatial outliers (noise), especially when large
amounts of point data are analyzed (e.g., big data).
Interpretation
Objects that lie inside a cluster are more similar than objects belonging in
different clusters.
Discussion and Practical Guidelines
There are three main unsupervised machine-learning algorithms to perform
density-based clustering named: DBSCAN, HDBSCAN, OPTICS. These algo-
rithms are typically applied in the N-dimensional space, but from the geo-
graphical perspective are applied on a two-dimensional space for analyzing
the spatial distribution of points.
DBSCAN (density-based spatial clustering of applications with noise) in the
context of spatial analysis uses only two parameters namely the maximum
radius and the minimum points to be included with this radius (Ester et al.
[REDACTED PHONE] ). It is based on the calculation of the geographical distances among
points along with a threshold distance value to calculate the density. Each
point within a cluster should contain a minimum number of points within a
given radius (neighborhood). The density in the neighborhood of a point has to
be larger than a threshold value. DBSCAN is a fast algorithm, but it is sensitive
to the search distance. Still, if clusters have similar densities, then it performs
well. DBSCAN does not require setting the number of clusters as the k-means
does, and it can trace clusters of irregular shape.
HDBSCAN (hierarchical DBSCAN) applies incremental distances to partition
data into meaningful clusters while removing noise (Campello[REDACTED PHONE] ). It is an
extension of DBSCAN, and it uses a hierarchical clustering algorithm. Compared
to DBSCAN and OPTICS, it is the most data-driven algorithm, and in this sense,
human interaction is kept to a minimum.
OPTICS (ordering points to identify the clustering structure) is another
density-based algorithm that creates a reachability plot used to separate clus-
ters from noise (Ankerst et al. [REDACTED PHONE] ). This algorithm is an extension of DBSCAN,
but it has been addressed to better identify clusters when density ﬂuctuates.
The reachability graph plots on the x-axis the points ordered as processed by310 Multivariate Data in Geography